Sister blog of Physicists of the Caribbean. Shorter, more focused posts specialising in astronomy and data visualisation.

Thursday, 28 August 2025

Signal boost

One man's trash is another man's treasure, or so the old saying goes. And in astronomy, one man's signal is another man's noise. 

The classic example of this is dust in the Milky Way. If you're interested in dust, then you're a deeply weird person... or just interested in star formation. Dust "grains", which are actually about the size of smoke particles, are thought to be critical sites for star formation because they allow atomic gas to lose energy, cool, and collide with each other to form molecular gas. This is much denser than atomic gas, so eventually this can lead to the cloud collapsing to form a star.

But if you're less of a weirdo, dust just gets in the way. It ruins our majestic sky by blocking our view of all the stars, especially along the plane of the disc of the Galaxy. Some regions are much worse than others, but it's present at some level pretty much everywhere across the sky.

In radio astronomy we have a much more subtle and interesting problem. Of course we always want our observations to be as deep and sensitive as possible. But sometimes, it turns out, the noise in our data can actually be to our advantage – though it comes at a price.

Consider a typical spectrum of a galaxy as detected in the HI line. If you aren't familiar with this, take a look at my webpage if you want details. Basically, it shows us how bright the gas from a galaxy is emitting (that is, how dense it is) at any particular velocity. Even without knowing this rudimentary bit of information though, you can probably immediately identify the feature of interest in the signal :

All the spectra shown in this post are artificial, generated with a simple online code you can use yourself here.

We don't need to worry here about why the signal from the galaxy has the particular structure that it does. No, what I want to talk about today is the noise. That's those random variations outside the big bright bit in the middle.

This example shows a pretty nice detection. It's easy to see exactly where the profile of the galaxy ends and the noise begins. But even within the galaxy, you can see those variations are still present : they're just lifted up to higher values by the flux from the galaxy. Basically the galaxy's signal is simply added to the noise.

Now if you still have an analogue radio, you'll know that if you don't get the tuning just right, you'll hear the sounds from your station but only against a loud and annoying background hiss. The worse the tuning, the worse the noise. So you might well think that the following claim is more than a little dubious :

Fainter signals can be easier to detect in noisier data.

So counterintuitive is this that one referee said it "makes no sense whatsoever", doubling down to label it "bizarre" and "not just counterintuitive, but nonsensical".

This is wrong. I'll point out that the claim comes not from me but from Virginia Kilborn's PhD thesis  (now a senior professor). So how does it work ?

The answer is actually very simple : signal is added to noise. That is, regardless of how noisy the data is, the signal from the galaxy is still there. Let me try and do this one illustratively. Suppose we have a pure signal, completely devoid of noise, and for argument's stake we'll give it a top-hat profile (about a quarter of galaxies have this shape, so this isn't anything unusual) :

The "S/N" axis measures the signal to noise, a measure of how bright things look given the sensitivity of the data. The numbers in this case are garbage because I set the noise to zero.

Now let's add it to two different sets of noise, purely random (Gaussian), of exactly the same statistical strength but just different in their exact channel-to-channel values :


Note that even with this purely random noise, you can still see apparent ripples and variations in the baseline outside the source : even random noise, to the human eye, looks structured.


Oh ! What happened there ? Why is the second signal so much clearer than the first ? You can still see the first one, to be sure, but it's marginal, and could easily be mistaken for some weird structure in the baseline. The second isn't great either, but it looks a lot better than the first.

Noise is typically random. That means that some parts of it will have bits of higher flux while other parts will be a bit lower. If we add our signal to the higher flux bits, the total apparent flux in our source gets higher. That is, the real flux in our source obviously doesn't change, but what we would measure would be greater than if the noise wasn't there. And of course the opposite could happen too : we could have noise dimming that makes the source harder to detect as well as easier.

Noise boosting (shown in the carefully-chosen example above), on the other hand, is no less important but far less expected. Every once in a while, a faint signal will happen to align with some bright parts of the noise and turn a marginal signal into a clearer one. This doesn't really work in the sorts of audio radio signals you get on a household radio set as these are much too complex, but the signals of a galaxy are a good deal simpler. And all we need to detect them is (for this basic example at least) pure flux, which the noise can readily provide.

(As an aside, you might notice that the the actual peak levels in these cases aren't much different, though the average level inside the source profile is higher in the second case. While peak levels most certainly can be affected by noise boosting and dimming, what's absolutely crucial here is what detection method we're using to find the signals. I'll return to this below.)

Of course, there are limits to this : it will only work for signals which are comparable in strength to the noise. As the noise level gets higher, the random variations will increasingly tend to "wash out" our signal. Now the level of the signals we can receive varies hugely depending on the nature of our data set, but the signal-to-noise ratios (S/N or sometimes SNR) tend to be fairly constant. That is, a signal which is ten times the typical noise value (the rms) has the same statistical significance in any data set, but the actual flux value it corresponds to can be totally different. But expressing signal strength in terms of the noise level makes things very convenient : a five sigma (5σ) source just means something that's five times brighter than the typical noise level.

So suppose we have a 2σ source which happens to align with a 3σ peak in the noise : bam, we've got ourselves a quite respectable 5σ detection*. But if we keep the flux level of the signal we're adding the same and increase the noise level, then the ratio of S/N will go down. Instead of adding 2σ to 3σ, we'll be adding ever lower and lower values : the "sigma level" of the noise won't change, but that of the signal certainly will. Pretty quickly we won't be shifting that 3σ peak from the noise by any appreciable degree. We'll be adding the same flux value but to an ever-greater starting level. 

* Sometimes five sigma is quoted as a sort of scientifically universal gold-standard discovery threshold. This is simply not true at all, because if you have enough data, you'll get that level of signal just by chance alone. Far more importantly, the noise in real data is often far from being purely random, so choosing a robust discovery threshold requires a good knowledge of the characteristics of the data set.

As a possibly pointless analogy, consider lions. If you go from having no lions to one lion, you've just put yourself in infinitely more danger. If you add a second lion you're in even more trouble. But if you've got ten lions and add one more, you won't really notice the difference.

"But hang on," you might say, "surely that means that your earlier claim that fainter signals are more detectable in nosier data can't possibly be correct ?". A perfectly valid question ! The answer is that it depends on how we go about detecting the signals. The details are endless, but two basic techniques are to search for either a S/N ratio or a simple flux threshold. These can give very different results to each other.

Now if you use S/N, which is generally a good idea, then indeed signals of lower flux levels generally don't do well in increasingly nosier data because of the ever-smaller relative increase in the signal. And of course, the effect of the noise to not merely obscure but actually suppress the signal will get ever greater, since there's just as much chance of aligning with a low-value region of the noise as a high-value region. 

But S/N is not the only way of detecting signals. You might opt instead to use a simple flux threshold instead : it's computationally cheaper, easier to program, and most importantly of all it gives you more physically meaningful results. If you do it that way, then it's a different story. When you add a signal to noise the flux level always increases, making it much easier to push the flux above your detection threshold by this method. Which makes noise boosting very much easier to explain.

Note here the change of axes values compared to the previous examples. All I did was increase the noise level by ~25% and here both peak flux and S/N levels have increased. It might not look easier to detect visually, but statistically, by some measures this one is more significant than the previous cases !

Even more interesting is the so-called Eddington bias. What this means is that any survey will tend to overestimate the flux of its weakest signals : those signals which are so faint that they can only be detected at all thanks to chance alignments with the noise. In this case, that when someone comes along and does a deeper survey, they'll often find that those sources have less flux than reported in the earlier, less-sensitive data.

There are plenty of other subtleties. The signal might not need to be perfectly superimposed on a noise peak : if it's merely adjacent to it, that can create the appearance of a wider, brighter signal which can be easier to detect to some algorithms (and people !). And of course while we'd like noise to be perfectly uniform and random, this isn't always the case. Importantly, the rms value doesn't tell us anything at all about the coherency of structures in the data, as so powerfully shown by the ferocious Datasaurus

For the eye the effects of this can be extremely complex, and are poorly understood in astronomy. If you have very few coherent noise structures, for example, you might think this nice clean background would make fainter structures easier to spot. But actually, I have some tentative evidence that this isn't always the case : the eye can be lulled into a false sense of emptiness, whereas if there are a few obvious structures to attract attention, you start to believe that things are present so you're more likely to identify structures. No doubt if your data was dominated by structures then the eye would, in effect, perceive them as background noise again and the effect would diminish, but this is something that needs more investigation. My guess is there's a zone in which you have enough to encourage a search but few enough that they don't obscure the view. 

Ultimately, getting deeper data is always the better option. For every source noise-boosted to detectability, there'll be another which is suppressed and hidden. But this explains very neatly that supposedly "nonsensical" result*, especially if your source-finding routine is based on peak flux : of course if you add signal to noise and have the same flux threshold in your search, you're more likely to find the fainter signals in the nosier data... up to a point. Set your threshold too low and you'll just find spurious detections galore, but hit the sweet spot and you'll find signals you otherwise couldn't.

* Kudos to the referee for accepting the explanation; I never heard of any of this until a couple of years ago either. Extragalactic astronomy is full of stuff which isn't that difficult but seems feckin' confusing when you first encounter it because it isn't formally taught in any lectures !

There's nothing weird about noise boosting then. Mathematically it makes complete sense. But when you first hear about it it sounds perplexing, which just goes to show how deceptively simple radio astronomy can be. Noise boosting, at least at a basic level, is quite simple, but simple isn't the same as intuitive.

Friday, 22 August 2025

ChatGPT-5 versus... DeepSeek

My excitement for ChatGPT-5 continues to defy the Will Of The Internet. Sod y'all, this is feckin' awesome ! This is the upgrade I've been waiting for.

It seems only fair to continue testing it, however. It performed extremely well against my own knowledge, but how does it compare to other chatbots ? After all, there's not much point in me getting all excited for this one in particular if others are actually doing comparably well.

I was intrigued by this article which pits it against DeepSeek, mainly in more real-life situations than what I've been testing. To save you a click, the "clear winner" was DeepSeek, with the unexpected proviso that GPT-5 is better at creativity and explanations – which again is in defiance of the judgement of reddit, who think that GPT-5 is absolutely dreadful at this sort of thing.

Before I report my own science-based testing, some of the author's preferences in the AI responses are... questionable. In brief :

  1. Logic puzzle : I find GPT-5's response much clearer here. DS's longer, "different interpretations" of a really very simple "puzzle" only add unnecessary confusion.
  2. Mathematics : I tend to agree with the author that DS gives more of a tutorial in its response, though GPT-5's walkthrough is easier to follow. Neither was asked to teach the general case, however, only a specific example. I'd call this one a dead heat.
  3. Project planning : Seems like a hopelessly vague prompt to me. The output doesn't seem meaningful.
  4. Budgeting : I tend to agree that DS is slightly better here, but by a whisker : it's more instructive. Purely a personality difference though, not a matter of actual content.
  5. Parenting : They seem exactly equal to me. No idea why the author prefers DS or thinks GPT-5 is "less organised".
  6. Lunches : Not sure what the author plans to do with half a banana, nor do they show the whole display so it's hard to check this one. As it is, I'd agree DS is better here for giving better instructions, but I can't check if it really stayed in the budget or not.
  7. Stories : A toss-up; very subjective.
  8. Culture : I agree with the author that GPT-5 is clearly better; DS's list isn't a good summary in this case.
  9. Social media : Both are crap. DS might be better but it's a case of shit sandwich or shit hotdog.

So while the author prefers DS in seven of the nine tests, I'd say that only seven responses have meaningful prompts with enough output shown for a fair evaluation. I prefer DS only once (maybe twice in the last case), GPT-5 twice, and find they're not differentiable in the other cases. Certainly not a clear winner for anyone.


I should add that I've rather gone off DeepSeek some time ago. It can give good, insightful results (for a while I strongly preferred it to ChatGPT), but it's unreliable – both in terms of its server and its output. It can also over-think the problem, and though being able to see its reasoning can be helpful (sometimes this can be better than the final output), it also means there's a lot of content to wade through. This becomes tiresome and counterproductive.

So for this test what I thought I would do is try to ignore the reasoning and just compare the final outputs. I'm not going to go to the same lengths as I did when testing GPT-5 against my own reading of papers. Instead I'll just ask DS some of the same questions I already recently asked GPT-5 (in-anger, real use cases) and compare the initial responses.


1) Effects of ram pressure stripping on the line width of a galaxy

Suppose a galaxy losing gas through ram-pressure stripping, with the angle to the wind being close to edge-on. What, if anything, is the expected effect on its HI line width ? Is it likely to reduce, increase, or not change the measured line width ? Consider the cases of moderate to strong stripping, corresponding to deficiencies of 0.5 and 0.7 respectively. Please also provide relevant citations, if possible.

Both agree that the width will decrease, but they disagree somewhat by how much. DeepSeek raised the important issue of turbulence, but that's it's only point in its favour : most of its numerous citations weren't especially relevant and certainly weren't pertinent to the specific claims it was making. Far worse was that it got "edge-on" completely wrong, taking this to mean exactly the opposite of how the term is used. GPT-5 gave a tighter response which was much more focused on the problem and with much better citations.

Winner : GPT-5, easily.


2) How do SMUDGES get their measurements ?

Hi ! I have a question about the SMUDGES catalogue of Ultra Diffuse Galaxies. I'd like to know how they compute their reported magnitude values. Does this involve fitting a surface brightness profile or do they just do aperture photometry ? If the later, how to they do they set the aperture size ? If the former, are they reporting total integrated magnitude within some radius or are they extrapolating ?

GPT-5 couldn't figure this out from web-based results so I had to feed it the paper. Once I did that it gave a concise, clear explanation. DS got the result correct using a web search with no uploads needed, and its explanation was longer but easier to follow. On the other hand, GPT-5 also came with the significant caveat that SMUDGES actually report both values, which is not correct.

Winner : DS for accuracy. GPT-5 was clearer, and it got the main point right, but it hallucinated a caveat (though it corrected this on asking it where in the paper that the second value was given) and made the answer more complicated than it needed to be.


3) Integrating the light in a galaxy

About the Sersic profile... suppose I have a galaxy with a Sersic index of 0.51. How much larger than the effective radius would I have to go to enclose at least 98% of the light ?

Both models gave the same answer, but GPT-5 took 38 seconds whereas DS took more than ten minutes (!). The explanations each model provided were equally unclear to me as a non-mathematician.

Winner : GPT-5 for sheer speed.


4) The safety of HIP2FITS

I'm curious about the HIPS2FITS service of astroquery. I have code that seems to work very well at extracting data from different surveys using a specified pixel scale, field of view, coordinates etc. What I'd like to know about is how the pixel values are rescaled when the pixel resolution is different from the standard survey data. In particular, I want to know if aperture photometry in DS9 will still be accurate, i.e. if the sum of the flux values within the same aperture will still be approximately the same. Presumably it can't be identical since now region will, by definition, enclose a different area if the pixel scale has changed, but will it be close if the scale is not too different ? When I requested a substantially different pixel scale for SDSS data (10" per pixel rather than the nominal 0.4") I got a very different result using the same aperture. Is this something that can be accounted for e.g. by changing the units in the FITS header, or by specifying some other parameter in astroquery (or by post-processing the result in some other way) ? Or should I always ensure I only use the standard pixel scale if I want accurate photometry ?

Here the answers are very different. GPT-5 says a straightforward no, you can't do photometry on the data and it's not a simply matter to correct it due to the regridding (though if the scale change is small, this is possible). DS also says no, but that it is a relatively simple matter of correction for the adjusted area – but recommends keeping the original scale whenever possible. Its suggested correction was, however, likely nonsense. It also said weird things like "even for the same source" and gave various other caveats that felt incoherent.

Winner : GPT-5 for clarity. 


5) How are stellar masses calculated ?

I'm curious about how galaxy stellar masses are estimated from photometric measurements. For example, I've seen several different recipes using, say, g-i or g-r colours, and I notice that they can give quite different result for the same object. How are these recipes derived, and how accurate are they ?

Both bots gave extremely similar answers here with no obvious major discrepancies. I couldn't honestly say I preferred either answer.

Winner : dead heat.


6) Star formation from GALEX

How can I quantitatively estimate the star formation rate of a galaxy using SDSS and GALEX data ?

This was an extremely simple prompt and both models gave similar answers. GPT-5 gave slightly more useable answers with more explanations; DS was a little less clear as to what was going on.

Winner : GPT-5, but it was close (and with the caveat that I haven't checked in detail as the responses were both quite long).


7) Analysing the large scale environment

I've got a student investigating an HI data cube of the Leo Group. We've already studied the Leo Ring volume and he's extending our study into the background volume, ~2.000 - 20,000 km/s. We'd like to be able to characterise the environment of our detections, i.e. to say if they're in any major known groups, clusters, or large-scale structures such as filaments. What's the best approach to do this ? I guess we can simply search for groups and clusters in NED, but maybe there's a better way (also this wouldn't tell us about the larger stuff). And we'd like to know any vital information about such structures, e.g. if something is particularly well-known for any reason.

For the only time in these tests, DS complained its server was busy. On a second attempt it gave a response which was pretty basic and not very helpful; "you can look up this data set" or "use this software", or worse, suggesting a comparison to simulations. GPT-5 gave much more helpful citations and guided instructions in what to do with our own data and how we could compare it to other catalogues. 

Winner : GPT-5, easily.


The verdict : ChatGPT-5 won five of the seven tests. DS only won once, with the other being a dead heat. Even leaving aside that GPT-5 had a narrow victory in one case, GPT-5 is the clickbaity "clear winner" here.

To be fair, I'd switched to DeepSeek because it was giving me better discussions than the contemporary model of ChatGPT, even when they first added reasoning. It's by no means a bad model, but it's unreliable compared to GPT-5. Its citations are frequently of dubious relevance, it seems to hallucinate more (the example of GPT-5 hallucinating here is the only example I've found of this so far*, and that was minor), and if you want its best output you'll have to wade through an awful lot of its reasoning processes. It's also slower than GPT-5, sometimes by an order of magnitude. And I've also found that DeepSeek rejects my uploads as violating policies even when there's nothing offensive in them; it won't even discuss them at all but just shuts down the discussion immediately.

* Excluding when you ask for it to analyse a paper by giving it a link rather than uploading. In that case it becomes worse than useless, so never do this.

Still, I have to admit that this does tend to curb my enthusiasm for GPT-5, but only a little. DeepSeek's answers were generally better in comparison to GTP-5 than I was expecting; DeepSeek gets most of the way there when it works. If GPT-5 is a revolution compared to previous OpenAI offerings, then it's only an incremental upgrade compared to DeepSeek. An important one, to be sure, but nevertheless incremental.

On the other hand, it's all about thresholds : how you cross them doesn't matter nearly as much as the fact you've crossed them at all. An increment which gets you across the line is just as useful as if you got there from a standing start. And if DeepSeek gets most of the way there most of the them, GPT-5 gets even further almost all of the time. With only one minor failure here compared to DeekSeek's two majors, the effect is non-linear. Even if GPT-5 doesn't drastically improve upon DeepSeek, it does so by more than enough I now see no point in using DeepSeek at all. 


That about wraps up the LLM-testing for the foreseeable future. Now back to the usual posts in which I explain science to the masses... whether I do this better than the chatbots I leave to readers to decide.

Wednesday, 13 August 2025

ChatGPT-5 Versus Me

It's time for another round of evaluating whether ChatGPT is actually helpful for astronomical research.

My previous experiments can be found here, here, and here. The first two links looked at how well ChatGPT and Bing performed when analysing papers I myself know very well, with the upshot being an extreme case of hit-and-miss : Occasional flashes of genuine brilliance wrapped in large doses of mediocrity and sprinkled with total rubbish, to quote myself. All conversations had at least one serious flaw (though arguably in one case, which was factually and scientifically perfect but had crippling format errors).

The third link tested ChatGPT's vision analysis by trying to get it to do source extraction, which was a flat-out failure. Fortunately there have been other tests on this which show it does pretty badly in more typical situations as well, so I'm not going to bother redoing this.

With the release of ChatGPT-5, however, I do want to redo the analysis of papers. If I can have ChatGPT give me reliable scientific assessments of papers, that's potentially a big help in a number of ways, at the very least in determining if something is going to be worth my time to read in full. For this one I picked a new selection of papers as my last tests were a couple of years ago, and I can't claim I remember all their details as well as I did. 

Because all the papers cover different topics, there isn't really a good way to standardise the queries. So these tests are designed to mimic how I'd use it in anger, beginning with a standardised query but then allowing more free-ranging, exploratory queries. There's no need for any great numerical precision here, but if I can establish even roughly how often GPT-5 produces a result which is catastrophically wrong or useless, that's useful information.

I began each discussion with a fairly broad request :

I'd like a short summary of the paper's major findings, an evaluation of its scientific importance and implications, what you think the major weaknesses (if any) might be and how they could be addressed. I might then ask you more detailed, specific questions. Accuracy is paramount here, so please draw your information directly from the paper whenever possible – specify your sources if you need to use another reference.

Later I modified this to stress I was interested in the strengths and weaknesses of the scientific interpretation as well as methodology, as GPT-5 seemed to get a little hung up on generic issues – number of sources, sensitivity, that sort of thing. I followed up the general summaries with specific questions tailored to each individual paper as to what they contained and where, this being a severe problem for earlier versions. At no point did I try to deliberately break it – I only tried to use it.

Below, you can find my summaries of the results of discussions about five papers together with links to all of the conversations.


0) To Mine Own Research Be True ?

But first, a couple of examples where I can't share the conversations because they involve current, potentially publishable research (I gave some initial comments already here). I decided to really start at the deep end with a query I've tried many times with ChatGPT previously and got very little out of it : to have it help with a current paper I'm writing, asking it to assess the merits and problems alike – essentially acting as a mock reviewer. 

* Of which the management, like the rest of us, is generally sensible about such things. We all recognise the dangers of hallucinations, the usefulness and limitations of AI-generated code, etc. Nobody here is a fanboy nor of the anti-AI evangelical sort.

Previously I'd found it to be very disappointing at this kind of task. It tends to get hung up on minutia, not really addressing wider scientific points at all. For example, if you asked it for which bits should be cut, it might pick out the odd word or sentence or two, but it wouldn't say if a whole section is a digression from the main topic. It didn't think at scale, so to speak. It's hard to describe precisely but it felt like it has no understanding of the wider context at all.; it discussed details, not science. It wasn't that using it for evaluations was of no value whatsoever, but it was certainly questionable whether it was a productive use of one's time.

With the current paper I have in draft, ChatGPT-5's response was world's apart from its previous meagre offerings. It described itself as playing the role of a "constructively horrible" reviewer (its own choice of phrase) and it did that, I have to say, genuinely very well. Its tone was supportive but not sycophantic. It suggested highly pertinent scientific critiques, such as the discussion on the distance of a galaxy – which is crucial for the interpretation in this case – being too limited and alternatives being fully compatible with the data. It told me when I was being over-confident in phrasing, gave accurate indications of where I was overly-repetitive, and came up with perfectly sensible, plausible interpretations of the same data. 

Even its numbers were, remarkably, actually accurate* (unlike others I haven't seen it make some classic errors in basic facts and numbers, including the number of specified letters even in fictional words; I tried reproducing some of these multiple times but couldn't). At least, that is, those I checked, but all those I checked were on the money – a far cry indeed from older versions ! Similarly, citations were all correct and relevant to its claims : none were total hallucinations. That is a big upgrade.

* ChatGPT itself claims that it does actual proper calculations whenever the result isn't obvious (like 2+2, for which training data is enough) or accuracy is especially important.

When I continued the discussion... it kept giving excellent, insightful analysis; previous versions tended to degenerate into incoherency and stupidity in long conversations. It wasn't always right – it made one major misunderstanding to one inquiry that I thought it should have avoided* – but it was right more than, say, 95% of the time, and its single significant misunderstanding was very easily corrected**. If it was good for bouncing ideas off before, now it's downright excellent. 

* This wasn't a hallucination as it didn't fabricate anything, it just misunderstood the question.
** And how many conversations with real people feature at least one such difficulty ? Practically all of them in my experience.


The second unshareable test was to feed it my rejected ALMA proposal and (subsequently) the reviewer responses. Here too the tone of GPT-5 shines. It phrased things very carefully but without walking on eggshells, explaining what the reviewer's thought processes might have been and how to address them in the future without making me feel like I'd made some buggeringly stupid mistake. I asked it initially to guess how well the proposal would have been ranked and it said second quartile, borderline possibility for acceptance... praiseworthy and supportive, but not toadying, and not raising false hopes.

When I told it the actual results (lowest quartile, i.e. useless), it agreed that some of the comments were objectionable, but gave me clear, precise instructions as to how they could be countered. Those are things I would find extremely difficult to do on my own : I read some of the stupider claims ("the proposal flow feels a bit narrative"... FFS, it damn well should be narrative and I will die on this hill) and just want to punch the screen*, but GPT-5 gave me ways to address those concerns. It said things like, "you and I know that, but...". 

* No, not really ! I just need to bitch about it to people. Misery loves company, and in a perverse bit of luck, nobody in our institute got any ALMA proposals accepted this year either.

It made me feel like these were solvable problems after all. For example, it suggested the rather subtle reframing of the proposal from detection experiment (which ALMA disfavours) to hypothesis testing (which is standard scientific practise that nobody can object to). This is really, really good stuff, and the insight into what the reviewers might have been thinking, or not understanding, made me look at the comments in a much more upbeat light. Again, it had one misunderstanding about a question, but again this was easily clarified and it responded perfectly on the second attempt.

On to the papers !


1) The Blob(s)

This paper is one of the most interesting I've read in recent years, concerning the discovery of strange stellar structures in Virgo they attribute to being ram pressure dwarfs. Initially I tried to feed it the paper by providing a URL link, but this didn't work. As I found out with the second paper, trying to do it this way is a simply mistake : in this and this alone does GPT-5 consistently hallucinate. That is, it claims it's done things which it hasn't done, reporting wrong information and randomly giving failure messages.

Not a great start, but it gets better. When a document is uploaded, hallucinations aren't quite eliminated, but good lord they're massively reduced compared to previous versions. It's weird that its more general web search capabilities appear rather impressive, but give it a direct link and it falls over like a crippled donkey. You can't have everything, I guess.

Anyway, you can read my full discussion with ChatGPT here. In brief :

  • Summary : Factually flawless. All quoted figures and statements are correct. It chose these in a sensible way to give a concise summary of the most important points. Both scientific strengths and weaknesses are entirely sensible, though the latter are a little bland and generic (improve sensitivity and sample size, rather than suggesting alternative interpretations).
  • Discussion : When pressed more directly for alternative interpretations, it gave sensible suggestions, pointing out pertinent problems with the methodology and data that allow for this.
  • Specific inquiries : I asked it about the AGES clouds that I know are mentioned in this paper (I discovered them) and here I encountered the only real hallucination in all the tests. It named three different AGES clouds that are indeed noteworthy because they're optically dim and dark ! These are not mentioned in this paper at all. When I asked it to check again more carefully, it reported the correct clouds which the authors refer to. When I asked it about things I knew the paper didn't discuss, it correctly reported that the paper didn't discuss this.
  • Overall : Excellent, once you accept the need to upload the document. Possibly the hallucination might have been a holdover from that previous attempt to provide the URL, and in my subsequent discussions I emphasised more strongly the need for accuracy and to distinguish what the paper contained from GPT-5's own inferences. This seems to have done the trick. Even with these initial hiccups, however, the quality of the scientific discussions was very high. It felt like talking with someone who genuinely knew what the hell they were talking about.

2) The Smudge

This one is about finding a galaxy so faint the authors detected it by looking for its globular clusters. They also find some very diffuse emission in between them, which is pretty strong confirmation that it's indeed a galaxy of sorts.

At this point I hadn't learned my lesson. Giving ChatGPT a link caused it to hallucinate in a sporadic, unpredictable way. It managed to get some things spot on but randomly claimed it couldn't access the paper at all, and invented content that wasn't present in the paper. Worse, it basically lied about its own failures.

You can read my initial discussion here, but frustrated by these problems, I began in a second thread with an uploaded document here. That one, I'm pleased to say, had no such issues.

  • Summary : Again, flawless. A little bland, perhaps, but that's what I wanted (I haven't tried asking it for something more sarcastic). The content was researcher level rather than general public but again I didn't ask for outreach content. It correctly highlighted possible flaws like the inferred high dark matter content being highly uncertain due to an extremely large extrapolation from a relatively novel method.
  • Discussion : In the hallucinatory case, it actually came up with some very sensible ideas even though these weren't in the paper. For example, I asked it about the environment of the galaxy and it gave some plausible suggestions on how this could have contributed to the object's formation – the problem was that none of this was in the paper as it claimed. Still, the discussion on this – even when I pushed it to ideas that are very new in the literature – was absolutely up to scratch. When I suggested one of its ideas might be incorrect, it clarified what it meant without changing the fundamental basis of its scenario in a way that convinced me it was at least plausible : this was indeed a true clarification, not a goalpost-shifting modification. It gave a detailed, sensible discussion of how tidal stripping can preferentially affect different components of a galaxy, something which is hardly a trivial topic.
  • Specific inquiries : When using the uploaded document, this was perfect. Numbers were correct. It reported correctly both when things were and weren't present in the article, with no hallucinations of any kind. It expanded on my inquiries into more general territory very clearly and concisely.
  • Overall : Great stuff. Once again, it felt like a discussion with a knowledgeable colleague who could both explain specific details but also the general techniques used. Qualitatively and quantitatively accurate, with an excellent discussion about the wider implications.


3) ALFALFA Dark Galaxies

My rather brief summary is here. This is the discovery of 140-odd dark galaxy candidates in archival ALFALFA HI data. The ChatGPT discussion is here. This time I went straight to file upload and had no issues with hallucinations whatsoever.

  • Summary : Once again, flawless. Maybe a little bland and generic with regard to other interpretations, but it picked out the major alternative hypothesis correctly. And in this case, nobody else has come up with any other better ideas, so I wouldn't expect it to suggest anything radical without explicitly prompting it to.
  • Discussion : It correctly understood my concern about whether the dynamical mass estimates are correct and gave a perfect description of the issue. This wasn't a simple case of "did they use the equation correctly" but a contextual "was this the correct equation to be using and were the assumptions correct" case, relating not just to individual objects but also their environment. Productive and insightful.
  • Specific inquiries : Again flawless, not claiming the authors said anything they didn't or claiming they didn't say anything they did. Numbers and equations used were reported correctly.
  • Overall/other: Superb. I decided to finish by asking a more social question – how come ALFALFA have been so cagey about the "dark galaxy" term in the past (they use the god-awful "almost darks", which I loathe) but here at least one team member is on board with it ? It came back with answers which were both sociologically (a conservative culture in the past, a change of team here) and scientifically (deeper optical data with more robust constraints) sensible ideas. It also ended with the memorable phrase, "[the authors are] happy to take the “dark galaxy” plunge — but with the word “candidate” as a fig leaf of scientific prudence."


4) The VCC 2034 System

This is a case of a small fuzzy patch of stars near some larger galaxies, possibly with a giant HI stream, which has proven remarkably hard to explain. The latest paper, which I summarise here, discounts the possibility that it formed from the long stream as it apparently doesn't exist, but (unusually) doesn't figure out an alternative scenario either. The ChatGPT discussion is here.
  • Summary : Factually perfect, though it didn't directly include that the origin of the object was unknown. Arguably "challenges simple ram-pressure stripping scenarios and suggests either an intergalactic or pre-cluster origin" implies this, but I'd have preferred it to state it more directly. Nevertheless, the most crucial point that previous suggestions don't really hold up came through very clearly.
  • Discussion : Very good, but not perfect. While it didn't get anything wrong, it missed out the claims in the paper against the idea of ram pressure dwarfs more generally (about the main target object of the study it was perfect). With some more direct prompting it did eventually find this, and the ensuing discussion was productive, pointing out some aspects of this I hadn't considered. I'm not entirely convinced this was correct, but no more than I doubt some of the claims made in the paper itself – PhD level hardly means above suspicion, after all. And the discussion on the dynamics of the object was extremely useful, with ChatGPT again raising some points from the paper I'd completely missed when I first read it; the discussion on the survival of such objects in relation to the intracluster medium was similarly helpful.
  • Specific inquiries : Aside from the above miss, this was perfect. When I asked it to locate particular numbers and discuss their implications it did so, and likewise it correctly reported when the paper didn't comment on a topic I asked about. 
  • Overall : Not flawless, but damn good, and certainly useful. One other discussion point caused a minor trip-up. When I brought in a second paper (via upload) for comparison and mentioned my own work for context, it initially misinterpreted and appeared to ignore the paper. This was easily caught and fixed with a second prompt, and the results were again helpful. By no means was this hallucination – it felt more like it was getting carried away with itself.


5) An Ultra Diffuse Galaxy That Spins Too Slowly

This was a paper that I'd honestly forgot all about until I re-read my own summary. It concerns a UDG that initial observations indicated lacked dark matter entirely, but then another team came along and found that would be unsustainable and it was probably just an inclination angle measurement error. Then the original team came back with new observations and simulations, and they found it does have some dark matter after all – at a freakishly low concentration, but enough to stabilise it. The ChatGPT discussion is here.

  • Summary : As usual this was on the money, bringing in all the key points of the paper and giving a solid scientific assessment and critique. Rather than dealing with trivialities like sample size or simulation resolution, it noted that maybe they'd need to account more for the effects of environment or using different physics for the effects of feedback on star formation. 
  • Discussion : As with the fourth paper, this was again excellent but not quite complete. It missed out one of my favourite* bits of speculation in the paper that this object could tell us something directly about the physical nature of dark matter. It did get this with direct prompting, but I had to be really explicit about it. To be fair, this is just one paragraph in the whole article, but reading between the lines I felt it was a point the author's really wanted to make. On the other hand, that's just my opinion and it certainly isn't the main point of the work. 
  • Specific inquiries : Yep, once again it delivered the goods. No inaccuracies. It reported the crucial points correctly and described the comparisons with previous works perfectly. Again, it didn't report any claims the authors didn't make, 
  • Overall : Excellent. I allowed myself to branch out to a wider discussion of the cold dark matter paradigm and it came back with some great papers I should check out regarding stability problems in MOND. It sort of back-peddled a little bit on discussions about the radial acceleration relation, but this was more a nuanced clarification than revising its claims : CDM gets RAR as a result of baryonic physics tuning, but it gets this for free as a result of tuning for other parameters rather than directly for RAR itself; MOND gets RAR as a main feature. If that's not a PhD level discussion then I don't know what is.

* More generally, it seems pretty good at picking up on the same stuff that I do, but it would be silly to expect 100% alignment.


Summary and Conclusions

On my other blogs I've gone on about the importance of thresholds. Well, we've crossed one. Even the more positive assessments of GPT-5 tend to label it as an incremental upgrade, but I violently disagree. I went back and checked my earlier discussion with GPT-4o about my ALMA proposal and confirmed that it was mainly spouting generic, useless crap... GPT-5 is a massive improvement. It discusses nuanced and niche scientific issues with a robust understanding of their broader context. In other threads I've found it fully capable of giving practical suggestions and calculations which I've found just work. Its citations are pertinent and exist. 

This really does feel like a breakthrough moment. At first it was a cool tech demo, then it was a cool toy. Now it's an actually useful tool for everyday use – potentially an incredibly important one. Where people are coming from when they say it gets basic facts wrong I've honestly no idea. The review linked above says it gave a garbage response when fed a 160+ page document and was anything but PhD-level, but in my tests with typical length papers (generally 12-30 pages) I would absolutely and unequivocally call it PhD level. No question of it.

This is not to say it's perfect. For one thing, even though there's a GUI setting for this, it's very hard to get it to stop offering annoying follow-up suggestions it could do. This is why you'll see my chats with it sometimes ends with "and they all live happily ever after", because I had to put that in my custom instructions to give it an alternative ending (in one memorable case it came up with "one contour to rule them all and in the darkness bind them"*). Even then it doesn't always work. And it always delivers everything in bullet-point form : no doubt this can be altered, but I haven't tried... generally I don't hate this though.

* I really like the personality of GPT-5. It's generally clear and to the point, straightforward and easy to read, but with the occasional unexpected witticism that keeps things just a little more engaging.

Of course, it does still make mistakes. Misinterpretations of the questions appears to be the most common, but these are very easily spotted and fixed. Incompleteness seems to be less common but more serious, but I'd stress that expecting perfection from anything is extremely foolish. And actual hallucinations of the kind that still plagued GPT-4 are now nearly non-existent, provided you give it rigorous instructions.

So that's my first week with GPT-5, a glowing success and vastly better than I was expecting. Okay, people on reddit, I get that you missed the sycophantic ego-stroking personality of GPT-4, so whine about how your virtual friend has died all you want. But all these claims that it's got dumber, and has an IQ barely above that of a Republican voter... what the holy hell are you talking about ? That makes NO sense to me whatsoever.

Anyway I've put my money where my mouth is and subscribed to Plus Watch this space : in a month I'll report back on whether it's worth it.

Tuesday, 29 July 2025

Stop stripping the dwarves, they don't like it !

Today's paper revisits a very minor but interesting storm in a teacup.

Back in 2021, Junais et al. reported on a possible Ultra Diffuse Galaxy losing gas in the Virgo Cluster. At face value it all looked very convincing. The HI gas detection was very clear, nicely offset from the UDG-candidate (basically an especially faint, fluffy sort of galaxy if you aren't keeping up with things – shame on you !) but still overlapping it. At the very centre of the gas detection was a sort of ragged line of blue starlight, plus there were some patches of stars scattered about as well. It's all very much as you'd expect if this was star formation occurring in the stripped material.

Okay, ram pressure is old hat. But to find evidence of this occurring in a UDG would be especially interesting : it would allow us to start investigating whether UDGs in clusters (which seem to be pretty common) are the same as those in the general field (which are known to exist, but we don't know how many there are). In particular, there's this whole controversy over whether they lack dark matter or not, in which case the effects of stripping might be quite different since the gravitational forces involved would be much less. And also it would show whether both cluster and field UDGs form by the same process, or whether there are multiple ways to form the same sort of objects.


All this was strongly challenged by Jones et al. 2021  They said, no, hang on, the distances are all wrong. Using high-resolution Hubble data, they were able to show that the distance to the UDG-candidate is actually much closer than the Virgo Cluster, and it also seemed to be linked to another, much brighter galaxy (VCC 2034) by a giant bridge of HI, so presumably that would also be at the same distance. The patchy starlight, however, could well be in the Cluster, in which case it would require a different explanation because it doesn't look anything like a galaxy.

You may or may not remember that I'm moderately skeptical about all this. It's not that I don't believe the distance estimate... it's that I'm wary about them after that whole "ping-pong" series of papers concerning some other UDGs – a debate which apparently still isn't fully settled. That suggests we shouldn't take any single value as definitive but should wait for multiple analyses. 

And the HI stream... although I did send Jones some of our deeper (WAVES) data, and he was able to find the stream by taking a slice through at the right angle... it feels very off to me. Given that our HI data is about 3-4x deeper than the original ALFALFA observations, I'd expect it to be immediately obvious in our data. It isn't. My suspicion is that the analysis and source-finding package SoFiA (which is hella powerful) is oversmoothing here, creating the appearance of a bridge because of the degraded resolution, as is used for increasing sensitivity.

I don't know for sure though. I'm moderately skeptical, but no more than that.

Enter today's paper by Yu-Zhu Sun and friends. They use a combination of new deep data from FAST and high resolution data from the VLA. And this paints a pretty convincing picture that the patchy starlight is not the result of gas stripping from VCC 2034, even if they agree that it's nothing to do with the UDG candidate. This is my favourite sort of paper in that it doesn't actually solve the mystery but just demonstrates that things were even weirder than initially thought.


This is all quite complicated : there are several different galaxies in this region, plus the fuzzy stars, plus the possible gas bridge, and conflicting distance claims for all of them. Let's try a few simple diagrams to illustrate. I'll start with the main hypotheses proposed by Junais and Jones. But, since both groups quite rightly caveat their conclusions and aren't definitive, and don't all deal with the same objects, I'm going to try and standardise and simplify things a little bit. This should be enough to get a general sense of what's going on, but this is very much a limited guide. Needless to say, these are not to scale !


Hypothesis 1

The most straightforward interpretation is the original : that the fuzzy starlight ("The Fuzz", a.k.a. AGC 226178) close to the UDG candidate (here UDG-X, official designation NGVS 3543) and is the result of star formation in its stripped gas. The nearby pair of galaxies VCC 2034/2037 are seemingly unrelated. 

All galaxies, in this scenario, are in Virgo at about 17 Mpc distance from us. The gas cloud associated with UDG-X and the Fuzz align well and VCC 2034/2037 is rather far away, so an association isn't at all natural. VCC 2034 has its own gas, showing clear signs of removal. In fact this extends in the direction of UDG-X but doesn't reach nearly far enough, so the orientation doesn't appear to indicate anything interesting. It's also aligned with VCC 2037, but that too is imperfect (not covering the whole of VCC 2037 and the local maxima of the gas is not aligned with the galaxy's centre) and the velocities of the two galaxies don't match well. So this too may just be a coincidence – the two objects might both be in the cluster, but at sufficiently different distances that they aren't actually related. Regardless, they really don't seem to have anything to do with UDG-X at all.


Hypothesis 2


The second scenario relies on a number of additional observations. Direct distance estimates suggest that both UDG-X and VCC 2037 are at 10 Mpc, much closer than the Virgo Cluster (17 Mpc, estimated elsewhere to be be 1-2 Mpc deep). However the Fuzz seems still to be at the cluster distance, and there's a much larger bridge of HI apparently connecting it to VCC 2034. So essentially, the Fuzz results from gas stripping of the cluster member VCC 2034, whereas UDG-X is so close to us that's actually not a cluster member at all : it may or may not relate to VCC 2037 instead. This would make UDG-X an uninteresting normal dwarf galaxy, but the Fuzz becomes very interesting as a rare example of star formation in a gas tail.

Note again that that the existence of the large HI envelope is uncertain, and that it's probably not a great idea to trust the distance estimates overmuch. Furthermore, as we're about to see, even the high resolution HI data can't be treated as gospel.


Hypothesis 3
Stressing that the latest paper is even more cautious, here's their essential idea : there's no big HI envelope and both VCC 2034/2037 show independent HI tails (in the new VLA data) that don't align with the Fuzz or UDG-X at all. UDG-X may well be foreground (again making it a normal dwarf galaxy), but neither it nor the Fuzz are directly related to any of the major galaxies in the general vicinity. What, then, is the origin of the Fuzz in this scenario ?

A tricky question indeed, one which they understandably don't commit to answering. Their main conclusion is that the Fuzz is likely not stable and in the process of disintegration, but as to what formed it in the first place, they don't (can't) say.

Disclaimer : I know a few of the co-authors very well, have published with them, and certainly hope to do so again ! They raise many excellent points, but there are a few with which I disagree. For example, they say that the HI cloud around the Fuzz has a "well-defined" velocity gradient of 10 km/s, but that's the width of the HI line itself so I'm very skeptical that this can be in any sense meaningful.

They do, however, have both new, extremely sensitive FAST data (even slightly deeper than WAVES), and new VLA data which should be of even higher resolution than the earlier observations. The FAST data fails to show the large HI envelope, as does WAVES – and taken together this seems to quite reasonably disprove its existence. I had in mind a simple project to see if this could really result from how the data was processed... maybe one day I'll have the time to try it, as it would be nice to know exactly how this happened if indeed it doesn't exist.

What about UDG-X ? The FAST data is highly sensitive but low resolution, and can't distinguish gas associated with the Fuzz (which definitely does exist) from UDG-X. The Sun et al. VLA data, however, shows much less of a head-tail morphology than the earlier data, now appearing to only be associated with the Fuzz. That makes it unlikely that Fuzz is the result of gas stripping from UDG-X, though it can't be said with too much confidence. There could still be diffuse gas in UDG-X which the VLA wouldn't detect, or the entire gas of the Fuzz might have been displaced wholesale from UDG-X.

And when they say they detect a velocity gradient in this case, it looks a lot more like a very sudden change to me. Their dynamical mass estimates – how much mass is needed to keep the system stable – are, I think, stretching things beyond the quality that the data can sustain, given how narrow the velocity width of the object is. That said, they say the total amount of dark matter that would be present is so low that this is unlikely be a dark/dim galaxy candidate : more likely it's some form of debris. That seems entirely reasonable from the low line width, even if I'd be skeptical about the exact dark matter mass estimate.

But is the debris stable ? That's much harder to answer. A lot of recent work has found candidates for so-called "blue blobs", which are interpreted as gravitationally-bound clumps of gas and stars that formed by the removal of gas from ordinary galaxies by ram pressure. In essence this would be a new class of stellar system, not really galaxies in the classical sense (since they'd have no dark matter) but not star clusters either (being very much larger and formed by a totally different mechanism).

Personally I rather like this idea, but here they place a few well-aimed holes in the scenario. The high metallicity of the clouds seemed in Jones like strong evidence that the clouds originated from within galaxies, as otherwise their chemistry should be basically hydrogen and bugger all else – you need prolonged star formation to cause significant enrichment, which isn't going to happen at their current pathetic levels of star formation activity. But here they say it could happen through mixing with the gas in the cluster itself. On the other hand, the paper they cite in support of this says that metallicity should drop with distance from the parent galaxy, whereas all the blue blobs have essentially the same high metallicity value. So this is an interesting critique, but not a fully convincing one.

Similarly, they're rather skeptical of the whole pressure confinement scenario for blue blobs – the idea here being that the gas within the cluster helps prevent them from disintegration. Now when we simulated this for dark clouds with very high velocity dispersions, we found it flat-out didn't work. But we were investigating rather exceptional systems, and simulations of low velocity dispersion systems have found very much more favourable results (as you'd expect anyway : with a low dispersion, things can only expand more slowly by definition). So I think their toy model is overthinking things. In any case, given the extremely low dispersion of the Fuzz's gas cloud, it would only expand by 10 kpc in a billion years... even if it is technically disintegrating, it's doing so so slowly that it might as well not be.

Finally, I don't agree at all with their interpretation regarding the location of the blue blobs within the cluster. The previous paper by Dey suggested that they're found in regions of modest cluster gas density because this is where they can both form and survive for a while; they avoid the denser core because this would rapidly destroy them. But Sun et al. claim that a "more natural" suggestion is that actually these objects are all outside the cluster in 3D space and only appear projected against it. Surely, though, if that were the case, we'd be equally likely to see such objects projected against the core ! To me, that the distribution of the objects relates to the geometry of the cluster feels like extremely compelling evidence that they are indeed within the cluster.


The long and short of it is that this is a very complex system, and it all serves to underscore that even observations don't always get the last word. It's particularly interesting that the new VLA data looks markedly different to the earlier findings, showing distinctly different structures. Likewise, I have to wonder why everyone is treating the distance estimates with such high confidence, given recent prominent debacles about how damn difficult it is to get these right.

As it stands, it now looks a lot less likely that the origin of the Fuzz can be explained by a giant gas stream from VCC 2034. But I, for one, am by no means convinced that we can rule out the original suggestion of stripping from a UDG, and I downright disagree that we can be so confident that it's a disintegrating gas cloud rather than a ram pressure dwarf. It's likely not a dark galaxy, however. 

Which leaves the usual question of : what would it take to resolve all this ? This is very tricky. Well, the question of the long gas stream could be easily answered by running SoFiA over data sets with artificial signals injected of similar configurations to the current system; if the long stream results from oversmoothing, this ought to be reproducible. Distance measurements are much harder to resolve unambiguously, but at a minimum, another team need to try this independently, preferably using different data. As to why the various VLA data of the same objects looks so different, however, I'm at a loss. It's definitely a weird system, but certainly an interesting weird.

Thursday, 3 July 2025

The Bunny Rabbit of Death

Today's paper is a bit more technical than usual, but sometimes you've gotta tackle the hard stuff.

Ram pressure stripping is something we seem to understand pretty well on a large scale. When galaxies enter a massive cluster containing its own gas, pressure builds up that can push out the gas in the galaxy. If it's going fast enough, and/or the cluster gas is dense enough, then the galaxy can loose all of its gas pretty quickly. No ifs or buts, it just looses all its gas, stops forming stars, realises it's made incredibly poor life choices, and dies.

Yeah, literally, it dies. It's run out of fuel for star formation, which means all its remaining massive blue stars aren't replaced when they explode as supernovae in a few million years. Slowly it turns into a "red and dead" smooth, structureless, boring disc, and maybe eventually an elliptical. There's a wealth of evidence that ram pressure is the dominant mechanism of gas loss within clusters, and everything seems to just basically... work. Which is nice.

But, as ever, the details are where it gets interesting. In the extreme case, what you'll see is a galaxy with a big long tail of gas, one single plume stretching off until it's torn apart and dissolved in the chaos of the cluster. 

Even here things can be complicated though. Some tails seem to have multiple components : extremely hot X-ray emitting gas, cooler neutral atomic hydrogen detectable with radio telescopes, intermediate temperature ionising gas that emits over very narrow "Hα" optical wavelengths, and very cold gas indeed that emits in the sub-mm regime. They may or may not have stars forming within the plume, and all of these different components can have radically different structures. Or they might all line up quite neatly. Sometimes all of these phases are present, sometimes just one or two.

And then, if a galaxy isn't in the extreme case, it can be even more complicated. If the ram pressure isn't enough to accelerate the gas to escape velocity, it can still be pushed out only to fall back in somewhere else in the disc. In short, it gets messy.

This paper attempts to understand one of those messy cases. It's part of the ALMA JELLY program, a large ALMA observing program run by my officemate Pavel Jachym (conflict of interest : declared ! BOX TICKED). Here they introduce the first analysis of one of their 28 target galaxies and tackle the important question (though they would never dare state it thus) : 

Why does it look like the Playboy bunny rabbit ?

Wait, wait... why is it called ALMA JELLY ? It's not an acronym as far as I know. Instead, "jellyfish" galaxies have become a popular name for galaxies experiencing ram pressure stripping as some of them have distinct, narrow tails that look very much like the tentacles of a jellyfish. The term has become somewhat abused lately, often used for any ram-pressure stripping galaxy regardless of what its tail looks like. Here they attempt to take back control of the term and define it as galaxies which have stars forming it their stripped material. This often occurs in narrow tendrils so it's a pretty good proxy for jellyfish-like structures, and highlights the unusual physics at work in these cases.

And, why ALMA ? ALMA observes the cold molecular gas, which is generally agreed to be the main component of star formation. The target here already has many observations at other wavelengths, but the molecular gas has been traditionally tough to observe. Now they can fill in the gap, and with extreme resolution too. 

So, the bunny rabbit. The first target for ALMA JELLY is NGC 4858. It's certainly a prime example of a jellyfish galaxy, with clear, bright tendrils of stars extending in one direction directly away from the centre of the Coma cluster in which it resides. It's also close to the cluster centre, where ram pressure ought to be very strong. Its got observations at a bunch of different wavelengths and it is, in short, a right proper mess. Really, it's the kind of thing I might be minded to throw up my hands and say, "hahahah no, I'm not touching that with a barge pole". Or, failing that, I might wave my hands furiously and say, "something something HYDRODYNAMICS !".

Hydrodynamic effects, the complicated interactions between two or more different fluids, are an easy get-out. Mixing of fluids causing extremely complex structures, so if something's a mess, it's a safe bet that hydrodynamics can explain it. Though, in that case you ought to run simulations to test if that really works or not.

Here they don't. Instead they try the much braver task of explaining it without any dedicated simulations, and even those simulations they do use don't have full hydrodynamic effects – just some very basic approximations of the major forces at work from the external gas. And yet they seem to have come up with a pretty convincing explanation.

It works like this. First, NGC 4858 is a grand design spiral, with two prominent spiral arms. As it rotates, each arm moves through a region where its subjected to varying ram pressure forces, which are greatest on the side rotating away from the cluster centre (where the gas is moving fastest away from the cluster, making it easiest to remove). A single, dense arm thus gives rise to a single, dense plume of gas – a tail. But this tail gas preserves some of the rotation it had around the galaxy's centre, so it doesn't just get blasted out into space – it keeps moving around the galaxy. This brings it into the shadow of the galaxy, protecting it from the wind of the cluster. Some of the gas is lucky enough that the greatly reduced ram pressure is now essentially impotent, and it falls back onto the galaxy.

Not all of it though. Some keeps going. If any makes it right around to the other side of the galaxy, it moves back into the zone of death and gets finally stripped away by the cluster gas once and for all. The key is that before it reaches this point, the gas gets compressed as it starts to hit the wind again. In the simulations they use as a reference, the galaxy doesn't have prominent spiral arms and shows a single prominent tail; they surmise that because NGC 4858 has two arms, this could naturally give rise to two tails (or ears).

Their observations also show direct evidence of gas returning to the galaxy. The ALMA observations allow them to make a velocity map of the gas, and there's one big feature which is discontinuous with the rest of the velocity structure. And again, that fits with the basic model of how they expect rotating gas to behave.

I've simplified and shortened this one quite a lot, missing out on any number of interesting details. And there's an awful lot more they could still do with this data. But to me, the first thing I wondered when I first saw the ALMA image was "why is it a bunny rabbit ?". I was expecting this to have a much more complex non-answer, featuring hand-waving and invocations to hydrodynamics galore, possibly involving a chicken sacrifice. As it is, they managed to come up with a decent explanation without any of that, which is no mean feat. Both the bunnies and the chickens can rest easy.

Now all they have to do is convince Playboy to give them a sponsorship deal...

Wednesday, 2 July 2025

The Miniscule Candidate

Following on from those couple of papers on possible dark galaxies, comes... another paper on dark galaxies !

This one is a completely different sort of beast. While identifying optically dark galaxies is normally done by looking for their gas instead of their stars, here they use good old-fashioned optical telescopes instead. Even weirder, having found something which is optically faint but not dark, they then go on to infer its dark matter content without measuring its dynamics at all !

If this all sounds very strange, that's because it is. It's by no means crazy, but it must be said that some of the claims here should be taken with a very large pinch of salt.

Let's go right back to basics. A good working definition of a galaxy is a system of gas and/or stars bound together by dark matter. True, there are some notable exceptions like so-called tidal dwarf galaxies, but it's questionable whether we shouldn't drop the "galaxy" for those objects altogether (maybe replace it with "system" or something instead). Clearly they're physically very different from most galaxies, which are heavily mass-dominated by their dark matter.

A dark galaxy, then, is just a dark matter halo with maybe some gas but definitely no stars. Or is it ? For sure, if it really has literally zero stars, then such an object would definitely count as a dark galaxy. But what if it had just one star and billions of solar masses worth of dark matter ? Would it really be worth getting hung up on that point ? Presumably the physics involved in its formation would be basically the same as a truly dark object.

Generally speaking, most people would allow an object to qualify as a dark galaxy even if it had some small mass in stars. At present there's no strict definition, however, and so few candidate objects are known that setting a quantitative limit wouldn't really help. Right now, we don't know nearly enough about the physics of the formation of such objects, and indeed the jury's still out on when any of them exist at all. 

(Some people prefer the term "almost dark", which annoys me intensely. I prefer to call them dim when they have some detectable stars, but it hasn't caught on).

Anyway, you can see how this explains using an optical telescope to search for dark galaxies. But actually, here they go a step further. Rather than looking for the ordinary stellar emission from galaxies, which are normally in diffuse discs, they look only for the light emitted by the compact, relatively bright globular clusters. Most galaxies have these dense starballs which orbit around in their halos quite separately from their main stellar disc. What these authors are looking for are cases where they find groups of globular clusters without an accompanying disc : essentially, star clusters orbiting all by themselves in their dark matter halos. 

This is an interesting grey area in terms of calling something a dark galaxy, but I'd be inclined to say such objects would qualify. The physics at work in forming dense globular clusters and the diffuse stellar disc is quite different, so at the very least, these would certainly be extremely interesting.

Here they present the imaginatively named "Candidate Dark Galaxy 2". Really ? Yes, really. That's the name they're going with. Bravo, team.

(Actually, snarkcasm aside, this is a wee bit insulting, considering that there have been many candidate dark galaxies over the years, but I'll let that pass).

It turns out they had a previous candidate (you can guess the name) which is even more extreme than this one. CDG-1* consists of four globular clusters in close proximity to each other with no detectable diffuse emission between them at all. I won't attempt to discuss the complicated statistical methods they use to identify globular clusters without parent galaxies; at the words "trans-dimensional Markov chain" my eyes glazed over anyway. I can safely mention a few points though : 1) They don't have spectroscopic measurements of the globular clusters so they can't robustly estimate their distances*; 2) Their initial catalogues of globular cluster candidates are surely incomplete, but 3) Since they do careful inspection of the candidate cluster groups they do find, we can be confident that the associations they identify are real.

* I honestly can't remember if I heard about this at the time or not. I may have missed it or just forgotten about it.

* Spectroscopy gives you velocity, which is a very powerful constraint on (though not quite a direct measure of) distance.

CDG-2 initially consisted of three globular clusters, but here, using new data from Hubble and Euclid, they identify a fourth. While they still don't have spectroscopy, the new data confirms that the candidates are all unresolved. That means they cannot possibly be close objects, and in fact their colours and other parameters are consistent with their being in the Perseus galaxy cluster* at 75 Mpc distance. So it seems very unlikely that they're either significantly closer or further away. And while their might be a few free-floating globular clusters in Perseus (ripped off their parent galaxies by tidal encounters and the like), it's not very likely that they'd happen to be so close together.

* This can sometimes get very confusing. A globular cluster is a cluster of stars that orbits around a parent galaxy; a giant galaxy might host, say, several dozen such objects. A galaxy cluster is a whole bunch of galaxies, each with their own population of globular clusters, all swarming around together.

The killer argument that this is highly likely to be an actual galaxy, though, is that here they detect diffuse stellar mission between the globular clusters. The thing just looks like a galaxy, albeit an extremely faint one. The chance of a tidal encounter creating something like this isn't worth considering.

Ahh, but is it a dark galaxy ? That's where things get a lot more speculative. While we can be pretty sure about the distance of the object and their physical association, only spectroscopic measurements would really give a good handle on the total mass. Measuring how fast things are moving lets you infer how much mass you need to hold them together. Without this, they rely on scaling relations, extrapolating based on the globular clusters to infer a massive amount of dark matter : probably there are a few million solar masses of stars present in total, but it could easily have a hundred billion solar masses of dark matter based on the scaling relations. 

These are, however, truly enormous extrapolations. Given that Ultra Diffuse Galaxies are now known which have significantly lower dark matter contents than typical galaxies, but these too have globular clusters, I'd be wary about digging any deeper into this one until they get some spectroscopy.

 Even so, it's clearly a very interesting object indeed. Arguably even more interesting, however, is CDG-1, which still has no diffuse emission detected at all. Even if the extreme dark matter content turns out to be a wrong estimate, if either of them have any at all, they're still super weird objects. Hopefully when they find CDG-3 I won't be caught quite so unawares.

Friday, 6 June 2025

They're Heee-re...

Or are they ?

Today, two papers on my favourite science topic of all : dark galaxies. In the past there have been a multitude of candidate detections but spread out very thinly. You get, I'd guestimate, of order one or two such claims per year on average, with the total number now being somewhere in the low to mid tens. And not a single one is entirely convincing. Every single object is essentially unique, with its own particular considerations that make it more and/or less likely to be a dark galaxy.

Both of these papers claim to have alleviated the problem by finding a whole bunch more candidates. The first uses new data from the ASKAP telescope and comes up with 55 potential objects, while the second uses archival Arecibo data and finds 142. Impressive stuff – but are any of them plausible, or have the previous problems just reappeared in a larger sample ?

There are many difficulties with identifying a dark galaxy candidate. The resolution of radio telescopes that can detect their gas content is often much lower than optical instruments, which means you see a big blurry smudge on the sky. That makes pinpointing the exact position of the gas difficult, so it's hard to say whether it has an optical counterpart or not. It also makes estimating its total mass tricky : for this you need a precise measure of its size, so without it you can't really say how much dark matter it really has. And even if you do have good resolution, you need good optical data as well to say if it's really dark or just very dim (though when you get to sufficiently dim objects the difference is arguably not that important).

An even bigger problem happens when you manage to overcome all this. Even if you have an isolated gas blob with the signatures of stable rotation that would need lots of dark matter to hold it together, and even if you're darn sure it's so optically faint that it might as well be dark... it's damn hard to say if the thing really is stable. You could just be seeing a bit of fluff leftover from some interaction or other, which can sometimes mimic the appearance of a dark galaxy. Nevertheless, there have been a few cases where "dark galaxy" at least looks like a very plausible explanation, if never any where we can be certain that's really what's been found.

Both of the papers attempt to do much the same thing though in slightly different ways. Starting with large HI samples (30,000 for ALFALFA and 2,000 for WALLABY) they combine this with optical data sets and trim them down in various ways : quality of the HI signal, confidence in the lack of optical counterpart, isolation, etc. ALFALFA (the Arecibo data) has an enormous area of coverage and huge sample size on its size, while WALLABY (from the ASKAP telescope) has higher sensitivity and resolution. 

Since even the final candidate catalogues are, by the standards of dark galaxy research, really quite large, I'd be reluctant to say, "yep, this is definitely the solution, hurrah chaps, we've found them !". But nor would I at all dismiss them out of hand. Rather I would look at both of these papers as being potentially the foundation of interesting research, but it's too soon for any definitive results yet. These are both very solid starts, but we need to examine each and every object here in more detail, or at least a subsample. We need higher resolution data in all cases, deeper optical data... and most importantly, detailed studies of the local environment. We need to find the quintessential case of an isolated object with no plausible other origins, preferably rotating nice and quickly (which would mean fast dissipation if it wasn't bound by dark matter).

All that requires very careful, detailed work. Which of course we can now do, so kudos to them for that. But scientifically I'm neither excited nor dismayed. I am... intrigued.

The first paper finds its dark galaxies pretty much everywhere throughout its fields. There's not really any distance bias, so they occur at all masses – a few at really quite respectable standards even when compared with optically bright galaxies. Line widths look to be typically around 100 km/s, which is where we'd naively expect rotation – and hence a dark matter component – to be needed for stability. Sadly the resolution isn't good enough for them to attempt dynamic mass estimates, though this seems to me a bit strange – they have the upper size limit from the HI, so they could at least put a broad constraint on it. 

The other oddity is that they model the optical light profile of all their sources, where detected. This is ideal for quantifying whether any are Ultra Diffuse Galaxies (which are possibly closely related to truly dark galaxies) but they don't seem to do this. Maybe that's for a future paper.

The second paper attempts a lot more science. I have to say it's both strange and refreshing to see a member of the ALFALFA team being at least a little more enthusiastic about dark galaxy candidates; normally they insist on calling them 'almost darks' – including the quotes – which gets very annoying. None of that here ! I should stress, though, that both papers absolutely treat everything with the caution it deserves, so don't mistake the brevity of my summary as evidence that they leap to conclusions. Neither group does that – I'm omitting the caveats just to get to the point.

Which for this second paper is as follows. As per the first, their candidates are everywhere, spanning a wide range of masses and line widths, but generally found in less dense environments than bright galaxies. They have higher gas fractions (relative to their inferred dark matter masses*) than optically bright galaxies of similar masses. And these properties are qualitatively similar to what's found in numerical simulations of galaxy formation that produce dark galaxies.

* Being a bit more gung-ho than the first group, they assume a size of the galaxy based on the scaling relation with respect to HI mass, hence they get a dark mass estimate.

All this is very matter-of-fact, commendably so. It's a huge sign of much how things have changed in the last couple of decades : when I went to my first conference, back in 2007, dark galaxies were viewed by many as... not exactly fringe, but not really mainstream either. Most people agreed that they could at least exist, but were skeptical of their whole raison d'être – that they would be numerous enough to explain why cosmological models were massively overpredicting how many galaxies we would see. Indeed, for the next few years if often felt as if hardly anyone really believed in the standard models of galaxy formation, even if nobody had any better ideas to replace it. Quite frankly, if anyone had suggested they'd found a hundred or more dark galaxy candidates, no matter how cautiously, they'd have been laughed at. It wouldn't have been a career-ending move but it wouldn't have won them any friends either.

All that seems to have largely faded. The original models of galaxy formation, where gas falls into dark matter halos and a bunch of complicated stuff happens, now seem very much more popular, and so dark galaxies no longer seem like an almost dirty subject. What's happened is that we've got a lot better at doing all that complicated stuff and many of the problems which looked horrendous now look, if hardly definitely solved, then at least an awful lot more solvable. 

So, good work people. It's going to be extremely interesting to see how this pans out over the next few years. Watch this space.

Signal boost

One man's trash is another man's treasure, or so the old saying goes. And in astronomy, one man's signal is another man's no...