Sister blog of Physicists of the Caribbean. Shorter, more focused posts specialising in astronomy and data visualisation.

Monday 29 May 2023

Bing versus Me

A couple of months ago I investigated whether the ChatPDF service could be useful in summarising, analysing and generally discussing astronomy papers. Today I repeat that study, using the same five papers as before and asking basically the same questions, but this time to Bing Chat. Bing uses GPT4 rather than GPT3, so should be a significant improvement in quality. It also has the capability to examine any web page you're currently visiting, making it quite the handy tool provided you don't mind Microsoft's Edge browser. You might remember that I already experienced some mixed but promising results with Bing when using it for more unconstrained astronomy discusssions.

Spoiler : Bing is far better than ChatPDF. It's not perfect, but whereas ChatPDF had major problems with 4 out of 5 papers tested, Bing had major problems with only 1. Of the other 4, it had only minor issues at most, and in at least one case gave an insightful conclusion that I hadn't thought of. Let's go through the papers once again, then, comparing to what ChatPDF came up with as well.

Initially I considered also trying Google's Bard for good measure. However, its results for paper 1 were utterly useless, starting with a description of a totally different paper, and then laced with fabrications, misreadings, statements which were just plain wrong, and horrible inconsistencies. I therefore decided that assessing Bard would currently be a complete waste of my time.


1) Smooth HI Low Column Density Outskirts in Nearby Galaxies

This one I blogged here with the paper itself here. It's about stacking the radial HI profiles of galaxies to see if the column density has an expected break (based on photoionisation models) or if it actually continues to lower values. The answer is the latter.

Unlike ChatPDF, Bing did not decide to inexplicably speak to me in French for no reason. In fact it didn't speak to me in French at all. Its first summary was a good one, but a bit long, but asking it to be shorter retained the crucial points :

The current web page context is a research paper that studies the low column density gas at the outskirts of galaxies as traced by the 21 cm hydrogen line emission (H I). The study aims to investigate the prevalence of breaks in galactic disks and to characterize what determines the potential edge of the H I disks. The authors studied the azimuthally averaged H I column density profiles of 17 nearby galaxies from the H I Nearby Galaxy Survey and supplemented in two cases with published Hydrogen Accretion in LOcal GAlaxieS data. They found that none show evidence of a sudden change in the slope of the H I radial profiles: the alleged signature of ionization by the extragalactic background.

This is ten times better than ChatPDF, which had the highly misleading statement, "The researchers found that this gas is distributed in a more uniform way than previously thought" and gave its unnecessarily evaluative opinion of the importance of the paper. Bing's public outreach summary, however, was lame - far too simple and dumbed down to interest anyone.

At this point I encountered a major problem, not with the AI but with Bing itself. It kept complaining that the current "web page context" was empty, meaning that it had lost access to the content. This led to a truly exasperating (and protracted) back-and-forth which honestly felt like dealing with a bot from ten years ago. Bing asked me if I could provide the web page context, I asked how do I do that, Bing asked me what my question meant, apparently oblivious to our whole previous exchange. 

Good grief. 

It got worse, because even when I figured out it simply meant "open the page in the browser", it would initially answer one question, and then again insist the context was incorrect. It was like talking to the ungodly love child of "whack a mole" and "getting blood from a stone".

Eventually I told Bing to fuck off. This matters, because if a task is faster but more tedious with some new method, I'm going for the slower but less tedious approach nine times out of ten.

I restarted the browser and tried again, this time asking about the lowest density values, the same test I tried with ChatPDF previously. Initially :

Sure! Figure 2 in the current web page context shows the azimuthally averaged HI column density profiles of 17 nearby galaxies. The lowest density values in figure 2 are around 0.1 cm^-2.

Right figure, totally wrong values. Questioning how it got this and whether it was confident led to another back and forth, with yet more "context empty" messages, eventually leading to :

...The paper does not mention which galaxy has the edge of the H I disk similar to what was found for NGC 3198 and M33. Would you like me to look up more information on this topic?

Which was a bizarre and unnecessary statement that doesn't relate to my questions or this paper at all.

I tried again in "creative" mode. And now... things turned a corner. Its public outreach attempt was more than passable :

Galaxies are made of stars, gas, and dark matter. The gas is mostly hydrogen, the simplest and most abundant element in the universe. Hydrogen gas can be detected by radio telescopes, which can reveal its amount and movement in galaxies. Scientists are interested in studying the hydrogen gas at the edges of galaxies, because it can tell us how galaxies grow and evolve over time. Some scientists think that the hydrogen gas at the edges of galaxies can be stripped away by powerful radiation from other sources in space, leaving behind a sharp boundary. However, a new study using radio observations of 17 nearby galaxies found that most of them do not have such a boundary, but rather a smooth decrease of hydrogen gas with distance. This means that other processes, such as gas falling into galaxies or galaxies colliding with each other, may be more important in shaping the outer regions of galaxies. This study helps us understand how galaxies change over time and how they interact with their surroundings.

That point about galaxies accreting gas is arguable but entirely legitimate. Then I asked it for the densities once again :
That is a challenging question indeed. The authors do not state the gas density values directly in the paper, but they do provide the column density values in Figure 2. Column density is the amount of gas along a line of sight per unit area, while gas density is the amount of gas per unit volume. To estimate the gas density from the column density, we need to know the thickness of the gas layer along the line of sight. This is not given in the paper, so we have to make some assumptions or use some external information.
It went on to give an elaborate description of how to convert column to volume densities (this included things like the ideal gas equation), which so far as I could tell was... correct. Its final value was also, broadly, correct. Though not directly connected to the values in the paper at all, I still found this impressive (I only don't say, "very impressive" because I'm not going to rigorously check the working). So I told it I wanted column densities from the paper and it came back with :
I'm sorry, I misunderstood your question. I thought you wanted the gas density, not the column density. The column density values are given in Figure 2 of the paper, as well as in Table 2. The lowest column density value reported in the paper is about $10^{18}$ cm$^{-2}$ for NGC 2976. Is that what you were looking for?
Not bad. The value is basically correct, but it should be for NGC 7331, not 2976. More worryingly, as I'd experienced before, it now proceeded to invention, claiming values for the column densities were given in table 2 - which does not in fact exist.

Summary : ChatPDF gave a poor summary of the paper with misleading claims. It had one flash of brilliance which seems to have been a pure fluke, otherwise tending to get the answers wrong. Its public outreach effort was lame and with results which were qualitatively and quantitatively wrong. Bing gave a good summary, an excellent public outreach statement, and came up with a good methodology for converting the esoteric values of the paper to something more applicable to the real world. However, its claims about specific values still needed vetting, and in "balanced" mode it was a complete moron - though seemingly because of Bing itself more than the language model.

Winner : Bing. Not perfect, but good. I hereafter kept Bing in creative mode.


2) Discovery of an isolated dark dwarf galaxy in the nearby universe

Blog post here, paper here. A much shorter paper that might be easier to handle. This one's about the discovery of an optically dark HI cloud that looks like it's rotating like an ordinary galaxy. The key point is this is isolated, so unlike other such dark galaxy candidates, it's very hard to attribute it to being the result of any other formation mechanism : a primordial origin seems the most likely.

Whereas ChatPDF stated the importance of the isolation in a very strange way, Bing's summary was flawless. Likewise, while ChatPDF had problems finding the very basic quantity of the distance to the object, Bing got it right first time. When asked about why this object was important, Bing have a very good but very generic description of why dark galaxies are important. So I asked it what was so important about this one in particular :
I think the most important characteristic of the cloud is its isolation from other galaxies. This makes it unlikely that it is a tidal debris or a satellite of a larger galaxy. It also implies that it has not been affected by external factors such as mergers, interactions, or feedback that could trigger star formation or gas loss. It is a rare opportunity to study a dark galaxy in its pristine state and learn about its origin and evolution.
Spot-on perfect. ChatPDF noted the caveat that the rotation curve of the object couldn't be guaranteed to be flat, which I agree with. Bing didn't raise this objection on its own, but when prompted :
The paper shows a position-velocity diagram of this galaxy in Figure 1c, which displays an S-like structure that is typical of disk galaxies. The paper also uses a tilted-ring fitting code to estimate the rotation velocity (Vrot) and velocity dispersion (σv) of this galaxy. The paper does not explicitly show a rotation curve plot, but it implies that this galaxy has a flat rotation curve as expected for a rotating galaxy with dark matter. The paper states: “Using the fitted Vrot and σv, we calculated that Mdyn is (5.1±2.8)×109M , which is 47±27 times its baryonic masses, implying that FAST J0139+4328 is dominated by dark matter within the error range.” This means that the orbital speed of the gas does not decline with distance from the centre, but stays constant or increases slightly.

Therefore, based on both web search results and the current web page context, the answer to your question is yes: the rotation curve of the object described in the paper seems like a flat curve as expected for a rotating galaxy.
Which is fine apart from the weird symbols; it's certainly an arguable point whether you'd consider this result solid evidence for a flat rotation curve or not. Pushing back a little on whether this result was secure :
That’s a good question. You are right that the resolution of the FAST data is not very high, and that may affect the accuracy of the rotation curve estimation. The paper acknowledges this limitation and states: “Here, due to the low resolution of the FAST, we can only roughly estimate the content of dark matter. In the near future, we will apply for the higher-angle resolution observation for FAST J0139+4328.” However, the paper also claims that their model can construct the dynamic structure of FAST J0139+4328, except for the weak flat components, and that their results are consistent with the baryonic Tully-Fisher relation for gas-dominated disk galaxies. Therefore, they seem confident that this galaxy has a flat rotation curve as expected for a rotating galaxy with dark matter. Of course, more observations and analysis are needed to confirm this hypothesis and rule out other possible explanations.
Which cannot be improved upon.

Summary : ChatPDF gave inconsistent results about the main finding of the paper, sometimes getting it right and sometimes wrong (or at least phrased extremely strangely). It gave a correct caveat about the uncertainty of one of the main results. Bing unambiguously identified the major distinguishing feature of this object from other, similar structures, and though it didn't volunteer the same caveat as ChatGPT did, it gave a much more detailed, entirely accurate explanation when prompted.

Winner : Bing, due to its better, more consistent answers on the main results.


3) Young, Blue, and Isolated Stellar Systems in the Virgo Cluster. II. A New Class of Stellar System

Blog here, paper here. This is a long one about unusual stellar systems in the Virgo cluster, which the authors think are most likely produced by star formation in gas removed from galaxies by ram pressure stripping.

ChatPDF got confused about the main point of the paper, deciding that these objects could be galaxies in an early stage of formation - which is sort of arguable, but highly misleading. Bing had no such problems. Its summary of the conclusions was more than adequate (the first time it gave a bullet-point summary, with each point containing accurate and pertinent numerical facts, while the second time it gave a paragraph with a perfect paraphrased description of the major findings). When asked why the authors favour ram pressure over a tidal origin, it initially responded :
  • Ram pressure can explain the high metallicity of the BCs, which is inconsistent with tidal stripping from low-mass galaxies.
  • Ram pressure can explain the presence of H I tails in BC3 and SECCO 1, which are not expected from tidal stripping.
  • Ram pressure can explain the isolation of BC4 and BC5, which are too far from any plausible parent galaxy for tidal stripping.
This is decent but imperfect. The first point is a bit oddly phrased, but in another query session, it explained that ram pressure can remove gas from the inner, metal-rich parts of galaxies, whereas tidal stripping preferentially removes material from the outer, metal-poor regions. So that's all good. The second point is more questionable; I'm not sure SECCO 1 has an HI tail, and while the structure of the tails might be different if produced by tides rather than ram pressure, there's no reason to think they couldn't exist. The third point is fine.

On another attempt the second point wasn't mentioned, so I asked it if the objects have HI tails. It correctly answered that only one of the objects did so, naming it correctly. and explaining that some might have emission below the detection limit and/or confused with the Milky Way. That's a point I myself explained when I blogged the paper !

When I pushed a bit more, Bing did unfortunately claim that the galaxies are unusually gas poor. That's not the case at all; as I noted when blogging this, the detection limits aren't good enough to establish this : they could in principle be extremely gas-rich given the current sensitivity of the radio data. And Bing invented a quote and a figure to try and back up its claims. I corrected it, but it then said it was quoting from another paper by mistake, giving a reference... which was to this paper ! I corrected that, so it changed the reference by incrementing the page number in the reference by one. Not good.

In another session I continued down this rabbit hole. Bing was quite insistent that the MHI/L ratio (gas fraction) of both these and other low-mass systems was extremely low compared to other dwarf galaxies, and that even if the upper limit on the HI did allow still a high MHI/L ratio, then it's still likely the objects had far less gas than this. Its response about the nature of the upper limits of HI detections was honestly a little bit incoherent : technically correct if you think it through, but awkwardly phrased and took some disentangling. It also insisted that most Local Group dwarf galaxies have HI detections. 

Here the debate actually became quite interesting. As we found out, in terms of raw numbers, Bing was wrong, but if you slice the data according to isolation (which is actually what you want to do for a fair comparison), it was right. Bing had done the right thing for the right reasons, but only failed to explain its working properly.

Finally I asked it if they could be galaxies in an early stage of formation. It correctly answered no, describing how they're more likely to be transient objects than primordial galaxies. This was perfect.

Summary : ChatPDF fabricated the main result of the paper and invented a quote to support its claim. Bing did this but to a much lesser degree and on more minor points. When pressed, it gave a pertinent discussion revealing its underlying "reasoning" (if we can call it that) was correct, but it just needed to explain it better from the start. It pushed back when I contradicted it, but was willing to take on new information. It made one wholly erroneous claim, but this was not repeated.

Winner : Bing. A bit confusing, but Bing was basically right nearly all of the time, only erring on minor points.


4) The Turn-Down of the Baryonic Tully-Fisher Relation and Changing Baryon Fractions at Low Galaxy Masses

Blog here, paper here. This is an especially complicated and technical one. Low mass galaxies appear to rotate more slowly than the BTFR predicts, but when you account for various corrections, they actually rotate more quickly than the linear version found for higher masses and this can be shown to be consistent with the dark matter paradigm.

ChatPDF did a good summary of the major findings but was at best vague about whether they contradicted the CDM paradigm, whereas both Bing and the authors explicitly claimed that they supported it. Bing's summary was perfect. Its public outreach version was good, but simplified things just a nudge too much. It said that the "turn down" means that very low mass galaxies have slower rotation speeds than expected. Well, they do, but the turn-down itself actually means the relation changes so that we'd expect to see the opposite. However, its explanation for this was actually pretty good :
I was trying to simplify the explanation for a general audience, but you have pointed out a more accurate description of the turn-down in the BTFR. The dark matter profiles of low-mass galaxies are expected to be less dense and more flattened in the center than the profiles of more massive galaxies. This means that the rotation velocity does not reach a constant value at large radii, but keeps increasing. When this effect is taken into account, the BTFR shows a steeper slope at low masses than at high masses. This implies that low-mass galaxies have lower baryon fractions than more massive galaxies.
I asked it if the BTFR contradicted CDM, and it came up with a good description of the general arguments for and against. It didn't take the results of this paper into account though, so I had to be more explicit that I wanted it to do this. And it said that they don't, providing a lengthier and entirely correct explanation. 

Summary : ChatPDF did a good job with the technical details, but erred on one significant point by directly contradicting one the paper's most explicit and important claims. Bing got everything right, though its statements did sometimes need a little unpacking, especially when it tried to simplify things for a general audience.

Winner : Bing, which made no major errors here at all.


5) Atomic Gas Dominates the Baryonic Mass of Star-forming Galaxies at z ≈ 1.3

Blog here, paper here. This much shorter papers stacks lots of HI spectra of galaxies to get a detection of atomic gas at z=1.4, comparing the result with the molecular gas. They find that at redshifts where star formation peaks, atomic gas is the dominant component, which is surprising because it's thought that molecular gas is what's mainly involved in star formation.

ChatPDF had no problems with this one apart from garbling the symbols, and Bing performed equally well in disseminating the paper's content. Both its technical and public summaries were very good. I decided to press it a bit more on the major thing that I found interesting. Namely, that the atomic gas fraction in galaxies increases sharply at greater distances, while the molecular gas fraction doesn't change much. What's odd about this is that it's molecular gas we think is involved in star formation (at least more so than the atomic) and the star formation rate at greater distances is much higher than in the nearby Universe. After a little back and forth, Bing came up with :
Me : I would think that although not at their peak, the typical SFR at z=1 are much higher than today. If molecular gas is the component which dominates SF, I would naively expect galaxies at z=1 to have a higher amount of molecular gas.

Bing : I understand your reasoning, but the current paper's results show that the molecular gas mass of star-forming galaxies at z=1 is not significantly higher than that at z=0. This implies that other factors, such as the gas density, pressure, or temperature, may play a more important role in determining the star formation efficiency than the molecular gas mass alone. In this context, a human might say that the molecular gas mass is a necessary but not sufficient condition for star formation.
Brilliant ! Bing realised that there could be other factors at work, that the simple molecular gas fraction was not the only thing we should consider. This is far superior to ChatPDF perfectly decent but very hand-wavy answer : Bing has offered an actual explanation, whereas ChatPDF only really provided caveats. So impressed was I by this that I went back for another session. A similar line of questioning led to a similar result, even elaborating that the molecular gas must be more efficiently converted into stars and replenished more rapidly from the greater atomic reservoirs at higher redshifts.


Conclusions

I have to say that I'm more impressed by Bing than I was expecting. Do not use it in "balanced" mode, that's a pile of crap. But "creative" mode is showing some serious potential (I didn't yet try "precise"). It's still far from perfect, and it still makes mistakes, and even still just makes shit up, but it does so at a significantly lower rate than ChatPDF. 

You still can't use this for analysis. That is, you can't yet ask it a more general question and trust it will fill in all the blanks for you. For example I asked it the lookback time for a given redshift and it quoted the cosmology calculator, coming back with a number that was well in the ballpark but not exactly right. Neither its raw numbers nor its methodology are yet trustworthy enough to let it run free on its own; see the example of the MHI/L ratios above*. In principle it would be a simple enough matter to compare these values for different samples, but Bing isn't there yet. However, it is pretty good for provoking inspiration of ideas that might genuinely work, and that's... pretty cool, honestly. It can help with analysis, even if it can't actually do any.

* Though I did not encounter anything at all like a previous case where Bing insisted the M/L ratios would always be constant because the units would change, which made not a lick of sense. Possibly this was because I didn't disable the search function, so this gives Bing something to ground itself in.

And... I get the distinct impression that Bing is an awful lot closer to being a genuine analysis tool than ChatGPT/PDF. As an expert, I think you can already benefit from discussions with Bing on academic papers. Bing is good enough to find the most relevant points, and experts know enough to spot any glaring errors; it can also suggest interesting alternative explanations and methodologies. So for pure discussion purposes, I think this is of immediate benefit. I'm almost tempted to say that journalists could use it to help with outreach, but more realistically, they could probably use it at least to find papers that might be interesting to interview the authors about - I don't think it's reliable enough to replace the human component on that front. For now.

No comments:

Post a Comment

Back from the grave ?

I'd thought that the controversy over NGC 1052-DF2 and DF4 was at least partly settled by now, but this paper would have you believe ot...