Little Physicists: AI-Assisted Astronomy ?

Yesterday I decided to stop feeding the chatbot weird premises for crossover stories and try and use it for research. Not actual research, you understand, just a test. I did a similar exercise recently to see if a bot could properly fulfil its advertised ability to summarise papers and found it badly wanting, so my expectations were set pretty low. For this one I wanted instead to discuss something explicitly similar to my own research, so I wouldn't have to check all the answers because I pretty much already know them by heart. This was prompted by idle curiosity and learning that Bing AI is powered by GPT4, which is supposed to be a substantial improvement in terms of accuracy.

1) ChatGPT : Up to its old tricks again

First, I had a protracted discussion with ChatGPT about the possible nature of an extragalactic hydrogen cloud. I gave it some basic properties : line width, radius, distance, wavelength of detection. Then I asked it to speculate about its possible origins. Overall, it did pretty well at this, suggesting it might have been stripped from another galaxy or a primordial object that hadn't formed stars. I had to push it just a little to get it to estimate the dynamical mass and realise the key point that the object should be dark matter dominated, but it came up with results which were decently accurate. It came up with a genuinely good list of the other sources of motion in such a cloud besides rotation, noting that these were likely to be small in comparison, hence the need for dark matter.

At bit more prompting and it got a decent estimate for how long such a cloud could survive. It came up with a couple of correct examples of similar known objects too. I moved on to ask it about whether this high dynamical mass could be the result of stars and maybe the SDSS just wasn't sensitive enough to detect them. It produced a decent order-of-magnitude formulae to estimate this, but then it started to break down. It kept giving ever-more inconsistent numbers (I'm actually surprised it made it this far, since this is the free version which doesn't do actual calculations at all). When its mistakes were pointed out, it was pot luck as to whether its revised response would be any better or not. Still, its basic method for estimating the detectability of the stellar content, though crude, is something genuinely useful that I hadn't thought of before. And its list of suggestions for further research to help properly nail-down the cloud was absolutely 100% spot on.

After that it seriously degenerated when I asked it for a summary suitable for an academic paper. It started inventing all kinds of extraneous details, even deciding to give the cloud a plausible-sounding catalogue name, contradicting itself with regards to numbers, and deciding that the cloud had been detected in an optical survey despite explicitly being optically undetected. It even included irrelevant references for some reason. All in all, this part of the test was generally just unhelpful garbage. This was surprising and disappointing, because this is the sort of thing I'd expect ChatGPT to be good at.

In summary, it provided some useful ideas even at the expert level, but its specific numbers were, totally unsurprisingly, not at all reliable, and only when I prompted it did it admit it wasn't doing any calculations - something it absolutely should have been up-front about. I like that it's useful for exploring new ideas, but while this is beneficial, it's hardly revolutionary.

2) Bing Chat : A glimpse of the future or a freakishly coincidental hallucination ?

This one was truly strange, to the extent I almost wonder if I dreamed the whole thing. Bing AI is annoying for two huge reasons (besides, well, being Bing). First, you have to use Edge to run it (FFS, let me choose my own damn browser), and second because it gives you no easy way to save your history. It's either old-school copy+paste or nothing. And since I was on mobile, when I closed the app for a moment, all was lost. This is completely stupid.

At first it didn't look hopeful at all. In "balanced" mode it straight-up refused to give me anything useful in the way of an answer, shutting down the conversation completely so that you can't enter any more text, leaving you with no option but to start over. Why in the world anyone thought that giving it this "ability" was a good idea, I don't know. Again, this is stupid and frustrating, even for a preview tool.

And then... something truly amazing happened. In "precise" mode, it... did exactly what I wanted. True, it needed a little prodding, as ChatGPT did. But it also offered explanations that ChatGPT hadn't considered. It came up with citations as clickable links. It gave the formulae and, impressively, its numbers were absolutely self-consistent. It never messed up the masses of different components as ChatGPT did.

For an estimate of the optical detectability of the cloud it did (or at least appeared to do) something much more sophisticated than ChatGPT. It initially even said this was impossible without running a full population synthesis model with Staburst99, which it can't do and I'm not going to either. Then I told it to make a simpler estimate, and it required the stellar distribution (I told it to assume a standard IMF) and composition (I told it to assume solar metallicity). It then estimated the luminosity, assuming all the massing mass was stellar. It gave a value in Watts (I presume it defaults to SI units, which is not unreasonable) but had no problem converting into the more familiar solar luminosities and then apparent and absolute magnitudes.

I did not have the opportunity to check those numbers, but I do know they were perfectly credible. I'd really have love to scrutinise its calculations minutely, but I was sadly denied this opportunity. But they were certainly close to what I was expecting. I don't know if Bing AI has access to some mathematical tools (like the Wolfram Alpha plugin for paid versions of ChatGPT), but it certainly seemed like it was doing calculations and not just generating numbers statistically.

Bing AI stressed that these numbers were estimates and subject to a lot of uncertainty, something ChatGPT didn't do. I pushed it further, asking if it would be possible to alter the metallicity and/or IMF to render the galaxy undetectable, as with these simple assumptions the galaxy should be well above the SDSS sensitivity limit as I was expecting. It said yes, but when I asked to to check how, for example, the metallicity needed to do this compared to known galaxies, it found that the result was incompatible with known observations and gave me a reference to extreme metallicity values. Similarly for the IMF.

If correct, this is incredibly useful. A lot of tedious calculations and trawling papers... all gone, replaced with a quite natural style of conversation that gets right to the point.

And then all was lost forever. Worse, this morning Bing refuses to do any calculations at all (except in "creative" mode, which produces results which are wrong by many orders of magnitude). It won't give me the stellar mass estimate or even the dynamical mass. It comes up with formulae but its responses are partly garbled as it's very blatantly just scraping together bits of relevant text from different sources (it's at least honest about this and provides the links), and it point-blank refuses to admit it can do calculations at all. It even suggested I may have confused it with another chatbot. And to be fair, the experience is like using a different AI altogether, as though some bloke called Dave crawled around inside it and starting pulling out vital circuitry.

Well, I don't know what to make of all this. ChatGPT did better than I expected, and if that Wolfram Alpha plugin works as advertised... this could be extremely powerful. But as it stands, it's useful for discussions and ideas, but not actual analysis, and somewhat surprisingly, not for constructing replacement text either .

Bing, on the other hand... it might have been total garbage for all I know, that just happened to get things about right. But if (and I do stress "if" very strongly !) this is what using a language-model AI coupled with a genuine mathematical calculator is like, then it's transformative. I want this. Anyone saying it isn't useful is simply mad and wrong.

Little Physicists

Monday, 3 April 2023

AI-Assisted Astronomy ?

No comments:

Post a Comment

ChatGPT-5 versus... DeepSeek