Little Physicists: ChatPDF Versus... Me !

The latest offering from the wacky world of chatbots is something that claims to be actually practical.

Now, I love ChatGPT and simply cannot get my head around claims it's not useful. I'm fairly sure that this is just my ultra-cynical social media feed, but I will, however, cheerfully concede that ChatGPT's factual accuracy is less than impressive, as its its moral judgement. This is something that claims Poland is a landlocked country, won't output a funny version of the climax of Lord of the Rings ("because the events are meant to be taken seriously"), but also has no qualms about writing a bunny-based version of the Holocaust.

Go figure.

Anyway, if you approach ChatGPT as some kind of improved search engine then you're idiot plain and simple. That's just not what it's good for, any more than expecting a paintbrush and an easel are going to revolutionise the world of journalism. It doesn't make any sense.

But ChatPDF ought to be different. This is specifically designed to read and interpret the contents of PDF documents, including figures. It can summarise the main points and answer questions about what's in it. Now this is potentially a very useful tool indeed, as all too often, reading papers is a horrible chore. Sometimes I just need to find a specific value or equation and trawling though the whole thing isn't fun, with the Ctrl+F function being of limited help, especially with mathematical symbols. So I would hope (not expect !) that this would have at least a bit more rigour to it.

For reference, you can give a link to a PDF to ChatGPT itself, but it's shite. I sent it this one (selected at random), which is a paper entitled, "On the Reionization-Era Globular Cluster in Low-Mass Galaxy Eridanus II". It complained to me several times that the link doesn't work (it does, and arXiv's downtime is negligible, so that's just erroneous). When I told it I could see the paper in my browser, it came up with this :

I apologize for any confusion. It is possible that there was a temporary issue with the link earlier. I have tried accessing the link again, and this time it appears to be working properly.
The PDF you provided is titled “Exploring the Potential of Deep Learning for Automatic Music Genre Classification.” The paper discusses the use of deep learning techniques for automatic music genre classification. The authors conduct experiments on a large dataset of music recordings and evaluate the performance of various deep learning models. They compare their results to those obtained by other classification algorithms and find that the deep learning models outperform these other methods. The authors conclude that deep learning shows great potential for improving the accuracy and efficiency of music genre classification.

Which is pure garbage. Perhaps worse, when I gave it the correct name of the paper, it gave me a very plausible-sounding summary. Asking it where it got a value of the radius from, it confidently declared it was from a section that didn’t exist. Further corrections just led to a whack-a-mole of the same thing. This is dire, because it's all too easy to fall for its surprisingly coherent drivel.

But testing a random paper isn't a good idea since I've no idea what's in it. What would be a much better way to test this is... this blog ! More than 200 papers summarised by a source I can't have any higher level of confidence in, because it's me.

For this test I'm going to take the five most recent papers on this blog and use the specialised ChatPDF. Let's see if this does better than the ChatGPT app and how it compares to doing to hard work of actually reading the paper.

1) Smooth HI Low Column Density Outskirts in Nearby Galaxies

This one I blogged here with the paper itself here. It's about stacking the radial HI profiles of galaxies to see if the column density has an expected break (based on photoionisation models) or if it actually continues to lower values. The answer is the latter.

At first ChatPDF decided to speak to me in French. No idea why and it doesn't do it for other papers (the authors aren't French and they don't work in French institutes). So I told it to speak English instead it obliged. I asked for an explanation but all I could get was an apology. Weird.

Anyway, its summary of the paper was decent but not any better than the abstract. When I asked it for the main conclusion :

The main conclusion of the paper is that the outer regions of nearby galaxies contain smooth, low-density gas that is likely to be an important source of fuel for future episodes of star formation. The researchers found that this gas is distributed in a more uniform way than previously thought, which has implications for our understanding of how galaxies form and evolve over time. Overall, the paper provides new insights into the complex interplay between galaxies and their surrounding environments.

Initially I thought this was pretty good but on reflection it's actually very poor. Whether the gas could sustain future star formation is irrelevant : kudos for making the connection, but the authors don't so this is not warranted (also the total mass of the gas is unlikely to be significant). "More uniform" is arguable. They do use the word "smooth", but in context this is referring the 1D azimuthally-averaged profiles, i.e. they don't have any distinct break. That the average density of the gas decreases smoothly in profile doesn't mean the local density of the gas isn't highly variable. And the last sentence and a half are very much in the style of ChatGPT in that they contain no actual information but essentially give an opinion about the quality of the paper. They're okay for a general audience who need to know the broader relevance but of no use at all for a specialist who already understands this.

So far, so mediocre. Next I tried asking a more specific question about the lowest density value obtained. Initially it said this wasn't given, but then I corrected myself to column density. It said :

The paper states that the column densities of the gas in the outer regions of galaxies are low, dropping below 10^19 cm^-2. The researchers found that the gas is distributed smoothly and does not exhibit a sharp break in column density, which is contrary to what some photoionization models predict. This suggests that ionization by extragalactic photons may not be the limiting factor for the extent of the HI disk, and that instead, the outskirts of the disk may correspond to a transition to a low column density gas accreted from the cosmic web at later evolutionary stages of disk formation.

Now this is 100% correct and cannot really be improved upon. It gave a similarly impressive summary of the methodology, referencing the page numbers and giving the right quantitative values. On pressing for an exact value of the lowest column density measured, it reported that the authors don't give an exact value, which indeed they don't. So I tried :

Could you estimate the lowest value reached using their figure 2 ?
Based on Figure 2 of the paper, it appears that the lowest column density reached is around 5 x 10^18 cm^-2. However, this is just an estimate based on the graph and not a precise measurement.

Which is seriously impressive stuff. It would have been better if it had volunteered this itself, but that it can do this at all is amazing.

Except... it's a fluke. Repeating this in a later session it at first insisted the value was stated in the file itself, 10^19, which is confusing a limit with a measured value. Asking it to use the figures instead didn't work. Asking it for figure 2 specifically reverted to 10^19, which is just not right at all.

Finally I asked it for a public outreach summary. The result was pretty poor, no better than I could obtain from rewording the abstract, and when pressed, it insisted that the density values obtained were much higher than in ordinary air : this is wrong but about 25 orders of magnitude.

Oh dear. This isn't looking good at all.

2) Discovery of an isolated dark dwarf galaxy in the nearby universe

Blog post here, paper here. A much shorter paper that might be easier to handle. This one's about the discovery of an optically dark HI cloud that looks like it's rotating like an ordinary galaxy. The key point is this is isolated, so unlike other such dark galaxy candidates, it's very hard to attribute it to being the result of any other formation mechanism : a primordial origin seems the most likely.

The summary of this one was okay, but when I asked it for the main conclusion, it got hung up on the fact that the cloud is isolated. This is important, but it stated things in an almost tautologous way, as though isolated clouds should be expected to have nearby companion galaxies, which is obviously nonsense (otherwise they wouldn't be isolated). However, in a reverse of the first paper, when I asked it again in a new session, it gave a much better answer giving a direct quote for the paper that was absolutely relevant.

More disappointingly, it initially couldn't give the distance to the cloud, saying that the authors didn't state this. At first I thought this was correct and they just hadn't mentioned it, so I asked for the distance based on its systemic velocity. Now it gave the correct value. But later I found that they do actually state this value directly, so this is no more impressive than doing a Ctrl+F for "distance". On the other hand, it correctly reported that the rotation curve can't be guaranteed to be flat, which would be the key signature of a dark matter-dominated system.

So far, so very ChatGPT-like. Occasional flashes of genuine brilliance wrapped in large doses of mediocrity and sprinkled with total rubbish. Let's try another one.

3) Young, Blue, and Isolated Stellar Systems in the Virgo Cluster. II. A New Class of Stellar System

Blog here, paper here. This is a long one about unusual stellar systems in the Virgo cluster, which the authors think are most likely produced by star formation in gas removed from galaxies by ram pressure stripping.

The summary of the paper wasn't great, saying that :

The authors suggest that these systems may represent an early stage in the formation of larger galaxies, or they may be remnants of disrupted galaxies that were once part of a larger group or cluster.

Which is certainly not the main point at all. When I asked for the most likely origin, it did very much better :

The authors suggest that the most likely origin of these young, blue, and isolated stellar systems in the Virgo Cluster is ram pressure stripping. Ram pressure stripping occurs when gas is removed from a galaxy as it moves through the intracluster medium at high speeds. This process can lead to the formation of new stars in the stripped gas, which may then form into compact, low-mass systems like those observed in this study. The authors propose that these systems formed from ram pressure stripped gas removed from new infalling cluster members and survived in the intracluster medium long enough to become separated from their parent galaxies by hundreds of kiloparsecs. However, they note that tidal stripping cannot be conclusively excluded as a formation mechanism for these objects.

Spot-on perfect. Going back to its other claims, it told me that the possibility of being an early stage of galaxy formation was mentioned on page 2, giving a quote. But this was just flat-out wrong as this quote just doesn't appear anywhere in the paper at all. The same was true about the idea of being disrupted remnants, giving a quote and page reference that was a barefaced lie. Telling it it's made mistakes does have it correct itself, but this really shouldn't be necessary.

Dear oh dear oh dear.

4) The Turn-Down of the Baryonic Tully-Fisher Relation and Changing Baryon Fractions at Low Galaxy Masses

Blog here, paper here. This is an especially complicated and technical one. Low mass galaxies appear to rotate more slowly than the BTFR predicts, but when you account for various corrections, they actually rotate more quickly than the linear version found for higher masses and this can be shown to be consistent with the dark matter paradigm. ChatPDF's summary is quite correct :

The main conclusion of the paper is that the baryonic Tully-Fisher relation (BTFR) turns down at low galaxy masses, which indicates a decrease in the efficiency of galaxy formation. This turn-down is likely due to a combination of factors, including feedback from star formation and accretion onto dark matter halos. The authors suggest that further study of this phenomenon could help to resolve discrepancies between observed and predicted galaxy mass functions and improve our understanding of galaxy formation physics.

I asked it to describe this in terms of galaxy rotation speeds and was even more impressed :

The turn-down in the baryonic Tully-Fisher relation (BTFR) at low galaxy masses refers to a decrease in the efficiency of galaxy formation, which is reflected in the relationship between a galaxy's baryonic mass and its maximum rotational velocity. Specifically, at low galaxy masses, the observed rotational velocities are lower than what would be expected based on extrapolation from higher mass galaxies or from galaxy formation models. This suggests that there is a decrease in the efficiency of converting baryons into stars and/or a decrease in the amount of dark matter relative to baryons at low masses.

Asking it for the observational comparison with the BTFR from high mass galaxies also gave a perfect comparison. I asked it for the figure illustrating this and it correctly picked figure 5. Asking it if this could be reconciled with the CDM paradigm was more disappointing, as its answer amounted to a cautious "maybe" rather than describing the author's claims. Pointing to figure 9, where the authors do demonstrate how this reconciliation may be possible, it insisted that they didn't.

This is disappointing. Asking in a somewhat more technical matter gave an answer that was both better and worse :

Describe this in terms of rotation speeds, comparing with the predictions of a linear BTFR.
The right panel of Figure 9 shows that the observed rotational velocities of galaxies with baryonic masses less than 10^8 solar masses are higher than what would be expected based on a linear extrapolation of the baryonic Tully-Fisher relation (BTFR) from more massive galaxies. This indicates a turn-down in the BTFR at low galaxy masses, which suggests a decrease in the efficiency of converting baryons into stars and/or a decrease in the amount of dark matter relative to baryons at low masses. The green line in the figure represents the best-fitting linear relation between maximum rotational velocity (Vmax) and baryonic mass for galaxies with masses greater than 10^8 solar masses. The galaxies with lower masses that lie to the right of this line have higher Vmax values than what would be expected based on this linear BTFR. This discrepancy between observed and predicted rotation speeds at low galaxy masses is a challenge for the standard cold dark matter (CDM) model and may require modifications or additional physics to explain.

The problem is they say even in the abstract that this is consistent with CDM ! Once again I was given an invented quote. Now, this claim is reasonable but misleading. It's reasonable that we might not fully understand the physics of galaxy formation, especially at low masses, but the point of the paper is to demonstrate that this change of slope in the BTFR is consistent with CDM predictions. Saying it requires additional physics gives a completely different impression of the claim being made.

5) Atomic Gas Dominates the Baryonic Mass of Star-forming Galaxies at z ≈ 1.3

Blog here, paper here. This much shorter papers stacks lots of HI spectra of galaxies to get a detection of atomic gas at z=1.4, comparing the result with the molecular gas. They find that at redshifts where star formation peaks, atomic gas was the dominant component, which is surprising because it's thought that molecular gas is what's mainly involved in star formation.

ChatPDF's summary does not begin well :

We found that the baryonic composition of star-forming galaxies at z≠1 is dramatically different from that of galaxies at z≈0.

At redshifts not equal to one, eh ? Okay then. It also garbled the symbols in the rest of its responses. However, its statements were generally on point. It completely "got" the central point that the greater atomic fraction during the peak of star formation is surprising, e.g. :

Our study did not directly investigate which type of gas, molecular or atomic, is more important for star formation. However, we did find that the average atomic gas mass of star-forming galaxies with 〈Må〉≈10^10 Me is comparable to the average stellar mass at z≈1 and is significantly larger than both the average stellar mass and the average molecular gas mass at z≈1.3. This suggests that atomic gas may play an important role in star formation in galaxies at high redshifts.
That being said, it is generally believed that molecular gas is the primary fuel for star formation in galaxies (e.g., Kennicutt & Evans 2012). This is because molecular gas has a higher density and lower temperature than atomic gas, making it more conducive to forming stars. However, atomic gas can also be converted into molecular gas through various physical processes such as cooling and compression (e.g., Krumholz et al. 2009), so it is possible that both types of gas are important for star formation in different ways and under different conditions.

This is pretty perfect, references (which I'm not going to check) and symbols notwithstanding. I asked it for how many redshifts they considered and it correctly differentiated between the two new data points from the study plus the comparison sample at redshift zero. For this paper at least, the only fault seems to be the peculiar symbols. Its factual accuracy was in this case fine.

Conclusions

I'm afraid this one can't be said to be anything beyond the usual "impressive tech demo" stage. It is categorically not ready for actual use and anyone paying the subscription fee is at this stage a complete fool.

To give credit where credit is due, it does often produce remarkably good summaries that are more accessible than reading the abstracts. It can extract complex variables, even ones which are stated directly in the text. It seems to do better when you ask it very specific questions, but it's capable of handing complicated technical descriptions and distilling them down to their most relevant points in even more generalised ways.

The problem is that accuracy and usefulness do not scale linearly with each other. If it produces accurate statements 70, 80, even 90% of the time, it's only useful 0% of the time. Why ? Because that failure rate is such that its claims always have to be checked, and one would be better off just reading the paper. You have no idea if it's just making stuff up or missing a vital point. Worse, it's dangerously coherent. If you're not already an expert in the field, it produces statements which sound fully convincing but are in fact just plain wrong. I'm glad it references the parts of the text it's getting its information from, but it frequently just invents entire quotes, and that's unacceptable.

Garbage in garbage out ? In this case it's sometimes "gold in, garbage out", which is worse. It has the frustrating tendency to veer wildly between a superb, concise description and stuff which is pure fabrication, or underperforms compared to just doing a regular Ctrl+F.

That said, thresholds are important. Were it to reach, say, 95%, 99% reliability, then usefulness might well rise markedly and steeply above 0%. It doesn't need to be 100%, because reading papers doesn't produce a 100% accuracy either. It just needs to do a lot better than its current level. I'll be keeping my eye on this one, but, dear reader, it looks like this blog will fulfil a valuable purpose for the foreseeable future.

Little Physicists

Friday, 24 March 2023

ChatPDF Versus... Me !

No comments:

Post a Comment

ChatGPT-5 Versus Me