Sister blog of Physicists of the Caribbean. Shorter, more focused posts specialising in astronomy and data visualisation.

Saturday, 11 January 2025

Turns out it really was a death ray after all

Well, maybe.

Today, not a paper but an engineering report. Eh ? This is obviously not my speciality at all, in any way shape or form. In fact reading this only revealed to me even further the tremendous depths of my own ignorance regarding materials science and engineering practises. The former is something I never cared for at undergraduate level and the latter is something about which I know literally nothing. Naturally, I wouldn't normally even glance at a report like this, except that it's about a topic that's personally important to me : why Arecibo collapsed.

There's an okay-but-short press release version here. It's interesting to see the extent of the deconstruction at the site, which was already well advanced in 2021; I couldn't find a more recent photo. Otherwise the Gizmodo version is the 30-second read and not much else. For this post I read most of the full 113 page report, which really is "jaw dropping", at least in parts, as Gizmodo described it. Unsurprisingly there are fairly hefty tracts where my eyes glazed over, but there's still plenty in here that's accessible and understandable to non-engineers like me.

In a nutshell, Arecibo collapsed due to a combination of factors, two of which are predictable enough but the third is something nobody expected. The first two are inadequate maintenance and the impact of hurricane Maria. But it's important not to oversimplify, as these are intimately bound with the third : the effects of the radar transmitter. This is not quite a case where one can simply say, "if they'd just done their jobs properly then it'd still be standing today", thought the report does contain some damming stuff.

Going through this linearly would end up being a shorter version of the report, which wouldn't really help anyone. If you want that level of detail you should go through it yourself; it's thorough to the point of going back to hand-written notes from the earliest days of the telescope. I have to say, though, that it's also highly repetitive in parts and in my view somewhat self-contradictory in places – but as it says, this is a preprint and still subject to editorial revision. Anyway, rather than doing a blow-by-blow breakdown, let me extract some broader lessons here.


Safety is not the same as redundancy

Probably the most general lesson is, I presume, obvious to anyone with an engineering background. But to an outsider me the distinction between safety and redundancy was interesting just because it makes a lot of intuitive sense but I'd never heard of it before. Safety, apparently, refers to the breaking point of any particular element. For example a cable with a safety factor of two could support twice as much as its current load before it would snap. Redundancy, on the other hand, is about how many elements could fail before the whole structure would come crashing down. Arecibo's three towers, they say, don't provide redundancy because a single failure would inevitably mean a total collapse (compare with the six of FAST).

Of course it's very unlikely that even a single tower would ever fail because their safety factor was massive, so redundancy there was unnecessary (at least regarding any failure of the concrete towers themselves). The same can't be said for the metal cables, where the safety factor generally seems to have been about a factor two or a bit less, in accordance with standard design practises – still plenty, but with a need for redundancy just in case. The report stops short of saying that there were any actual design flaws in the telescope, but does not that it obviously would have been better if there had been more towers. Safety factors, they say, were not the issue, although I think I detect some inconsistency here. Where they do issue an outright criticism, for example, is that while the original cable system had redundancy, this was no longer true with the 1997 upgrade that added the 900-tonne Gregorian dome and altered the cable system. Which is a little bit in contradiction to their claims that the telescope didn't suffer from design faults. It's a bit muddled.


Poor maintenance contributed to but did not cause the collapse

At least this is the overall gist I got. There's plenty of criticism levelled here, but it's hard to disentangle how serious the maintenance problems really were. As I read it, a more diligent maintenance program probably could have prevented the collapse, but this is partly with the benefit of hindsight – the failures which occurred were unprecedented (see below) but should have been spotted all the same. Of particular concern is that there wasn't enough knowledge transfer during the telescope's two changes of management (I'll speak from first-hand experience in declaring that management changes should be avoided for a host of other reasons; I went through one such change at Arecibo and God knows what the staff must have felt like when a second took place only a few years later). In addition, and probably worst of all, is that the post-Hurricane Maria repair efforts were both much too slow, taking months to even get started and would have lasted for years, and targeting a cable which never failed. Major repairs needed to happen far sooner but there was also a need to identify the failures more accurately.

The failures were of the cable sockets rather than the cables themselves. In these "spelter sockets", there's normally some degree of cable pullout after construction is complete and the structure assumes its full load : these sockets are widely used so this is known to be absolutely normal and no cause for concern. But the report is somewhat ambiguous as to whether the extra pullout which happened could have been noticed. Sometimes it sounds quite damning in describing the extra movement as "clear" but elsewhere describes it as "not accessible by visual inspection". The amount of movement we're talking about, until the point of the collapse itself, was small, of the order 1cm or so. It certainly isn't something you'd spot from a casual glance, but you could measure it by hand easily enough with a ruler. Not noticing this, if I understand things correctly, meant that the cables were estimated to still have their original high safety factors whereas in fact they were much lower. They say this "should have raised the highest alarm level, requiring urgent action". Perhaps most damming, they also say that it is "highly unlikely" that this excessive pullout went unnoticed. They also note that there was a lack of good documentation of maintenance records and procedures.

The contribution of the recent hurricanes, especially Maria, was extremely significant in precipitating the collapse. In fact, "absent Maria, the Committee believes the telescope would still be standing today". Pullout from the sockets shortly after installation is entirely normal as the structure takes the weight, but after that, any further movement isn't normal at all. This did in fact happen, and should have been spotted – but even this, as we'll see, apparently wasn't enough to bring down the telescope by itself.

One final point is that Arecibo wasn't well liked by backend management. I often had the impression of a behind-the-scenes mood of delenda est Arecibo  or at the very least, that that was what some staff members sincerely believed was happening even if it wasn't true. The report notes that a 2006 NSF report recommended closing Arecibo by 2011 if other funding sources couldn't be found, which I found truly bizarre. This was less than ten years after a major upgrade and exactly at the point the biggest surveys were just beginning. As to why anyone would think that closing it at that particular moment was a good idea, I'm truly at a loss. Nothing about it, even with some familiarity of the large-than-life politics behind the place, has ever made a lick of sense to me. 

This is not, I hasten to add, any suggestion of deliberately shoddy maintenance; inasmuch as that was inadequate, there is no need to attribute that to anything besides an incompetently low budget. One strikingly simple recommendation in the report is that funding sources for site operations (e.g. science and development) and maintenance be entirely separate, so there is no chance of any conflict of interest or competition for resources which are essential to both.


The failure was unprecedented

The final and most interesting point of the report, the big headline message, is that Arecibo may have failed because of its radar transmitter. The report is emphatic, and repeats almost ad nauseum, that the kind of socket failures seen here have never before occurred in a century of operations of identical sockets used in bridges and other structures around the world. The damage from the hurricanes was significant, but not enough by itself to explain the failure. There is a crucial missing factor here.

The explanation suggested in the report is electroplasticity. In laboratory conditions, material creep (stretching) can be induced by electrical currents, apparently directly because of the energy released by the flow of electrons. As they note, in the lab this has been found under much higher currents operating for much shorter times, but could presumably work at lower currents if sustained for much longer periods. If correct, this would be Arecibo's final first, another effect of its unique nature. Such currents, they hypothesise, would have been induced by the powerful 1MW radar transmitter used for zapping asteroids and other Solar System objects. This would explain why the cables failed while still having apparently high safety factors, and possibly account for why the failures occurred in some of the youngest cables with no evidence of manufacturing defects (and weren't even the ones with the highest load). It would also, of course, explain why no such other socket failures have ever been seen. Hardly anything else has this combination of radar transmitter and spelter sockets, let alone in tropical conditions in an earthquake zone.

The report goes quite deep into the technical details of electroplasticity. Interestingly, it notes that even less powerful sources can induce currents in human skin that can be directly sensed a few hundred feet from the transmitter. The problem is that understanding the effects of these currently requires highly detailed simulations accounting for the complicated structure of Arecibo's cables, the exact path the current would follow, and using data on low, long-term currents that at present doesn't exist. The most obvious deficiency seems to me that they don't estimate just how long the radar was ever transmitting for. Sure, it was up there for decades, but it wasn't used routinely : regularly, to be sure, but not daily. This is something where a crude estimate should be relatively easy by searching the observing records; even the schedule of what was planned (which didn't always match what was actually done, usually because the err, radar broke) would give a rough indication.




If the report is correct, then there's little need for concern about other structures. The report strongly disagrees that Arecibo points to a need to revise safety standards for spelter sockets more generally; unless your bridge is in the path of 1MW S-band radar transmitter, you can carry on with your morning commute as usual. Well, that's good. Clearly, regardless of electroplasticity, something happened here that was truly exceptional, and not worth worrying about whether it will ever be repeated. Not unless you're an engineer, at any rate.

Whether electroplasticity really was the cause I'm not qualified to judge. Talking to someone older and wiser, the opinion was "they had to come up with something". I don't disagree with that – there just isn't enough data here to say anything for certain. It could be electroplasticity, or it could be something the committee just didn't think of. More analysis of the surviving hardware, along with more studies and simulations, is badly needed.

The broader lesson I would take from all this is that you can run things on a shoestring for a while, but you can't keep trying to do less with more indefinitely. Yes, I'm coloured by my political biases, but austerity to survive a short-term hit is very different to austerity as a way of life : one is manageable, the other isn't. Such a policy does far more harm than good. Yes, you save a little money immediately, but you ultimately lose an awful lot more a little further down the line. So if you're going to fund things, fund things as properly as you can. Incorporate redundancy thinking into managerial practises as well as engineering standards. Have teams large enough to survive the loss of several members. Hire separate observing support staff rather than expecting scientists to do everything. 

Finally, don't expect people to work for meagre compensation (and I'm here not thinking just financially but in other benefits, for example high pay is useless with long hours and/or low holiday time) just because they enjoy their job. Not even the most wildly enthusiastic, energy-driven fanatic can operate at 110% for long. Just because someone is uncomfortable doesn't mean they're working extra-hard. Part of America's puritan hangover appears to be in thinking that work = bad => people who are suffering are good workers. In the end this just leads to everyone hating their job are wanting to overthrow the system but having no clue what to replace it with. Far better to reverse the thinking and presume that those who are happy and comfortable are the best workers. 

This has taken a rather political turn, but it's not unmotivated by my experiences at Arecibo. One notorious manager was definitely of the ilk who believe that more work good, less work bad. Thankfully this is not a mentality I've encountered much in Europe. And, in my view, understanding this isn't just good for us as people, but actually as scientists in getting the work done we want to do. By all means, take a liberal approach : let those who want to work obsessively, who actively thrive because of it, do so, but don't presume the same conditions produce the same results from different people. They don't. As with good software interface design, in the end, solving these issues is just as important for the science we want to do as the scientific problems themselves. Soft issues produce hard results.

Monday, 5 August 2024

Giants in the deep

Here's a fun little paper about hunting the gassiest galaxies in the Universe.

I have to admit that FAST is delivering some very impressive results. Not only is it finding thousands upon thousands of galaxies – not so long ago, the 30,000 HI detections of ALFALFA was leagues ahead of everything else, this has already been surpassed – but in terms of data quality too it looks like it's delivering. This paper exploits that to the extreme with the FAST Ultra Deep Survey, FUDS.

Statistically, big, all-sky surveys are undeniably the most useful. With a data set like that, anyone can search the catalogue and see if anything was detected at any point of interest, and at the very least they can get an upper limit of the gas content of whatever they're interested in. Homogeneity has value too. But of course, with any new telescope you can always go deeper, as long as you're prepared to put in the observing time. That can let you find ever-fainter nearby sources, or potentially sources at greater distances. Or indeed both.

It's the distance option being explored in this first FUDS paper. Like previous ultra-deep surveys from other telescopes, FUDS tales a pencil-beam approach : incredibly sensitive but only over very small areas. Specifically it's about 12 times more sensitive than AGES but in an area almost 50 times smaller (or, if you prefer, 44 times more sensitive than ALFALFA but in an area 1,620 times smaller). This paper looks at their first of six 0.72 square degree fields, concentrating on the HI detections at redshifts at around 0.4, or a lookback time of about 4 Gyr. Presumably they have redshift coverage right down to z=0, but they don't say anything about that here.

They certainly knock off a few superlatives though. As well as being arguably the most distant direct detection of HI (excluding lensing) they also have, by a whisker, the most massive HI detection ever recorded – just shy of a hundred billion solar masses. For comparison, anything above ten billion is considered a real whopper.

All this comes at a cost. It took 95 hours of observations in this one tiny field and they only have six detections at this redshift. On the other hand, there's really just no other way to get this data at all (with the VLA it would take a few hundred hours per galaxy). Theoretically one could model how much HI would be expected in galaxies based on their optical properties and do much shorter, targeted observations which would be much more efficient. But this redshift is already high enough that optically the galaxies look pretty pathetic, not because they're especially dim but simply because they're so darn far away. So there just isn't all that much optical data to go on.

As you might expect, these six detections tend to be of extraordinarily gas-rich galaxies, with correspondingly high star formation rates. While they're consistent with scaling relations from local galaxies, their number density is higher than the local distribution of gas-rich galaxies would predict. That's probably they're most interesting finding, that we might be seeing the effects of gas evolution (albeit at a broad statistical level) over time. And it makes sense. We expect more distant galaxies to be more gas-rich, but exactly how much has hitherto been rather mysterious : other observations suggest that galaxies have been continuously accreting gas to replenish at least some of what they've consumed. For the first time we have some actual honest-to-goodness data* about how this works.

* Excluding previous results from stacking. These have found galaxies at even higher redshifts, but since they only give you the result in aggregate and not for individual galaxies, they're of limited use.

That said, it's probably worth being a bit cautious as to how well they can identify the optical counterparts of the HI detections. At this distance their beam size is huge, a ginormous 1.3 Mpc across ! That's about the same size as the Local Group and not much smaller than the Virgo Cluster. And they do say that in some cases there may well be multiple galaxies contributing to the detection. 

A particular problem is here is the phenomenon of surface brightness dimming. The surface brightness of a galaxies scales as (1+z)4. For low redshift surveys like AGES, z is at most 0.06, so galaxies appear only about 25% dimmer than they really are. But at z=0.4 this reaches a much more worrying factor of four. And the most HI-rich galaxy known (apart from those in this sample), Malin 1, is itself a notoriously low surface brightness object, so very possibly there's more galaxies contributing to the detections than they've identified here. It would be interesting to know if Malin 1 would be optically detectable at this distance...

On the other hand, one of their sources has the classic double-horn profile typical of ordinary individual galaxies. This is possible but not likely to arise by chance alignment of multiple objects : it would require quite a precise coincidence both in space and velocity. So at least some of their detections are very probably really of individual galaxies, though I think it's going to take a bit more work to figure out exactly which ones.

It's all quite preliminary so far, then. Even so, it's impressive stuff, and promises more to come in the hopefully near future.

Thursday, 1 August 2024

Going through a phase

Still dealing with the fallout from EAS 2024, this paper is one I looked up because someone referenced in in a talk. It caught my attention because the speaker mentioned how some galaxies have molecular gas but not atomic, and vice-versa. But we'll get to that.

I've no idea who the author is but the paper strongly reminds me of my first paper. There I was describing an HI survey of part of the Virgo cluster. This being my first work, I described everything in careful, meticulous detail, being sure to consider all exceptions to the general trends and include absolutely everything of any possible interest to anyone, insofar as I could manage it. Today's paper is an HI survey of part of the Fornax cluster and it is similarly careful and painstaking. If this paper isn't a student's first or early work, if it's actually a senior professor... I'm going to be rather embarrassed.

Anyway, Fornax is another nearby galaxy cluster, a smidgen further than Virgo at 20 compared to 17 Mpc. It's nowhere near as massive, probably a factor ten or so difference, but considerably more compact and dense. It also has less substructure (though not none) : the parlance being "more dynamically evolved" meaning that it's had more time to settle itself out, though it's not quite finished assembling itself yet. Its velocity dispersion of ~400 km/s is quite a bit smaller than than the >700 km/s of Virgo, but like Virgo, it too has hot X-ray gas in its intracluster space.

This makes it a natural target for comparison. It should be similar enough that the same basic processes are at work in both clusters : both should have galaxies experiencing significant tidal forces from each other, and galaxies in both clusters should be losing gas through ram pressure stripping. But the strengths of these effects should be quite different, so we should be able to see what difference this makes for galaxy evolution.

The short answer is : not all that much. Gas loss is gas loss, and just as I found in Virgo, the correlations are (by and large) the same regardless of how the gas is removed. I compared the colours of galaxies as a function of their gas fraction; here they use the more accurate parameter of true star formation rate, but the finding is the same. 

The major overall difference appears to be a survival bias. In Virgo there are lots of HI-detected and non-detected galaxies all intermingled. While there is a significant difference in their preferred locations, in Fornax this is much stronger : there are hardly any HI-detected galaxies inside the cluster proper at all. Most of the detections appear to be on the outskirts or even beyond the approximate radius of the cluster entirely. Exact numbers are a bit hard to come by though : the author's give a very thorough review of the state of affairs but don't summarise the final numbers. Which doesn't really matter, because the difference is clear.

What detections they do have, though, are quite similar to those in Virgo. They cover a range of deficiencies, meaning they've lost different amounts of gas. And that correlates with a similar change in star formation rate as seen in Virgo and elsewhere. They also tend to have HI extensions and asymmetrical spectra, showing signs that they're actively in the process of losing gas. Just like streams in Virgo, the total masses in the tails aren't very large, so they still follow the general scaling relations. 

So far, so standard. All well and good, nothing wrong with standard at all. They also quantify that the galaxies with the lowest masses tend to be the most deficient, which is not something I saw in Virgo, and is a bit counter-intuitive : if a galaxy is small, it should more easily lose so much gas as to become completely undetectable, so high deficiencies can only be detected in the most massive galaxies. But in Fornax, where the HI-detections may be more recent arrivals and ram pressure is weaker, this makes sense. They also quantify that the detections are likely infalling into the cluster for the first time in a now-standard phase diagram* which demonstrates this extremely neatly.

* Why they call them this I don't know. They plot velocity relative to the systemic velocity of the cluster as a function of clustercentric distance, and have nothing at all to do with "phases" in the chemical sense of the word.

The one thing I'd have liked them to try and this point would be stacking the undetected galaxies to increase the sensitivity. In Virgo this emphatically didn't work : it seems that there, galaxies which have lost so much gas as to be below the detection threshold have really lost all their gas entirely. But since Fornax dwarfs are still detectable even at higher deficiencies, then the situation might be different here. Maybe some of them are indeed just below the threshold for detectability, in which case stacking might well find that some still have gas.

Time to move on to the feature presentation : galaxies with different gas phases. Atomic neutral hydrogen, HI, is thought to be the main reservoir of fuel for star formation in a galaxy. The fuel tank analogy is a good one : the petrol in the tank isn't powering the engine itself. For that, you need to allow the gas to cool to form molecular gas, and it's this which is probably the main component for actual star formation.

There are plenty of subtleties to this. First, there's some evidence that HI is also involved directly in star formation : scaling relations which include both components have a smaller scatter and better correlation than ones which only use each phase separately. Second, galaxies also have a hot, low density component extending out to much greater distances. If the molecular gas is the fuel actually in the engine and the HI is what's in the tank, then this corona is what's in the petrol station, or, possibly, the oil still in the ground. And thirdly, cooling rates can be strongly non-linear : left to itself, HI gas will pretty much mind its own business and take absolutely yonks to cool into a molecular state.

Nevertheless this basic model works well enough. And what they find here is that while most galaxies have nice correlations between the two phases – more atomic gas, more molecular gas – some don't. Some have lots of molecular gas but no detectable HI. Some have lots of HI but no detectable molecular gas. What's going on ? Why are there neat relations most of the time but not always ?

Naively, I would think that CO without HI is the harder to explain. The prevailing wisdom is that gas starts off very hot indeed and slowly cools into warm HI (10,000 K or so) before eventually cooling to H2 (perhaps 1,000 K but this can vary considerably, and it can also be much colder). Missing this warm phase would be weird.

And if we were dealing with a pure gas system then the situation would indeed be quite bewildering. But these are galaxies, and galaxies in clusters no less. What's probably going on is something quite mundane : these systems are, suggest the authors, ones which have been in the cluster for a bit longer. There's been time for the ram pressure to strip the HI, which tends to be more extended and less tightly bound, leaving behind the H2 – which hasn't yet had the time to fully transform into stars. So all the usual gas physics is still in play, it's just there's been this extra complication of the environment to deal with.

What of the opposite case – galaxies with HI but no H2 ? How can you consume the H2 without similarly affecting the HI ? There things might be more interesting. They suggest several options, none of which are mutually exclusive. It could be that the HI is only recently acquired, perhaps from a tidal encounter or a merger. The former sounds more promising to me : gas will be preferentially ripped off from the outskirts of a galaxy where it's less tightly bound, and here there's little or no molecular gas. Such atomic gas captured by another galaxy may simply not have had time to cool into the molecular phase, whereas in a merger I would expect there to be some molecular gas throughout the process. 

Tidal encounters could have a couple of other roles, one direct, one indirect. The direct influence is that they might be so disruptive that they keep the gas density low, meaning its cooling rate and hence molecular content remains low (the physics of this would be complicated to explore quantitatively but it works well as a hand-waving explanation). The indirect effect is that gas at a galaxy's edge should be of lower metallicity : that is, purer and less polluted by the products of star formation. The thing is, we don't detect H2 directly but use CO as a tracer molecule. Which means that if the gas has arrived from the outskirts of a galaxy, it may be CO-dark. There could be some molecular gas present, it's just that we can't see it. Of course, to understand which if any of these mechanisms are responsible is a classic (and well justified) case of "more research is needed".

Tuesday, 23 July 2024

EAS 2024 : The Other Highlights

What's this ? A second post on the highlights of the EAS conference ? Yes ! This year I've been unusually diligent in actually watching the online talks I didn't get to see in person. Thankfully these are available for three months after the conference, long enough to actually manage to watch them but also, crucially, short enough to provide an incentive to bother. And I remembered a couple of interesting things from the plenaries that I didn't mention last time but which may be of interest to a wider audience.


Aliens ? There are hardly any talks which dare mention the A-word at astronomical conferences, but one of the plenaries on interstellar asteroids dared to go there. The famous interstellar visitor with the unpronounceable name of ʻOumuamua (which is nearly as bad as that Icelandic volcano that shut down European airspace a few years ago) got a lot of attention because Avi Loeb insists it must be an alien probe. He's wrong, and his claims to have found bits of it under the ocean have been utterly discredited. Still, our first-recorded visitor on a hyperbolic trajectory did do some interesting things. After accounting for the known gravitational forces, its rotation varies in a way that's inconsistent with gravity at the 10-sigma level. The speaker said that the only other asteroids and comets known to do this have experienced obvious collisions or have obvious signs of outgassing, neither of which happened here. He took the "alien" idea quite seriously.

Ho hum. No comment.


Time-travelling explosions. The prize lecture was by best-PhD student Lorenzo Gavassino, who figured out that our equations for hydrodynamics break down at relativistic velocities. Normally I would find this stuff incomprehensible but he really was a very good speaker indeed. And the main results is that they break down in a spectacular way. You might be familiar with simultaneity breaking, where events look different to observers at different speeds. Well, says Lorenzo, this happens to fluids moving at relativistic speeds in dramatic fashion : one observer should see a small amount of heat propagating at faster than the speed of light while another would see some energy travelling backwards through time. The result should be massive (actually, infinite) instabilities and the spontaneous formation of singularities. Accretion discs in simulations ought by rights to explode, and God knows what should happen to neutron stars.

The reason that this doesn't happen appears to be a numerical artifact which effectively smooths over this (admittedly small) amount of leakage. But what we need to do to make the equations rigorously correct, and how that would affect our understanding of these systems, isn't yet known.


Ultra Diffuse Galaxies may be tidal dwarfs in disguise. Another really interesting PhD talk was on how UDGs might form in clusters. When galaxies interact in low density environments, they can tear off enough gas and stars to form so-called tidal dwarfs. The key features of these mini-galaxies is that they don't have any dark matter (which is too diffuse to be captured in an interaction like this) and short-lived, usually re-merging with one of their parents in, let's say, 1 Gyr or so. But what if the interaction happens near the edge of a cluster ? Well, then the group can disperse and its members separated as they fall in, so the TDG won't merge with anything. Ram pressure will initially increase its star formation, increasing the stellar content in its centre and making it more compact, before eventually quenching it due to simple lack of gas. So there should be a detectable trend in these galaxies, from more compact to more diffuse going outwards from the cluster centre, all lacking in dark matter.

Of course this doesn't explain UDGs in isolated environments, but there's every reason to think that UDGs might be formed by multiple different mechanisms. A bigger concern was that the simulations didn't seem to include the other galaxies in the cluster, so the potentially very destructive effects of the tidal encounters weren't included. But survivorship bias was very much acknowledged : all galaxies, she said, get more compact closer to the centre, but not all survive at all. It's a really intriguing idea and definitely one to watch.


Even more about UDGs ! These were a really hot topic this year and whoever decided to schedule the session to be in one of the smaller rooms was very foolish, because it was overflowing. A few hardy souls stood at the back, but most gave up due to the poor air conditioning. Anyway, a couple of extra points. You might remember that I wasn't impressed by early claims than NGC 1052-DF4, one of the archetypes of galaxies without dark matter, had tidal tails. Well, I was wrong about that. New, deeper data clearly shows that it does have extended features beyond its main stellar disc. Whether that really indicates tidal disruption... well, I'll read the paper on that. And its neighbour DF2 remains stubbornly tail-less.

The other point is a new method for measuring distances to UDGs by looking at the stellar velocity dispersion of their globular clusters. This was the work of a PhD student who found that there's a relationship between this dispersion and the absolute brightness of the parent galaxy. Getting dispersion of the clusters is still challenging, requiring something like 20 hours on the VLT... but this is a far cry from the 100 HST orbits needed for the dispersion of the main stellar component of the galaxy itself. Apparently this work on the dispersion within individual clusters, so even one would be enough. They tested this on DF2 and DF4 and found a distance of....16 Mpc, right bang in the middle of the 13 and 20 Mpc claims that have been plagued with so much controversey.

Ho hum. No comment.


Fountains of youth and death. Some galaxies which today are red and dead appear to have halted their star formation very early on, but why ? One answer presented here quite decisively was due to AGN – i.e. material expelled from the enormous energies of a supermassive black hole in the centre of the galaxy. Rather unexpectedly it seems that most of this gas is neutral with only a small fraction being ionised, and detections of these neutral outflows are now common. In fact this may even be the main mechanism for quenching at so-called "cosmic noon" (redshifts of 1-2) when star formation peaked. Well, we'll see.

The other big talking point about fountains of ejected material was how galaxies replenish their gas. Here I learned two things I wish someone had told me years ago because they're very basic and I should probably have known them anyway. First, by comparing star formation rates with the mass of gas, one can estimate the gas depletion time, which is just a crude measure of how long the gas should last. And at low redshift this is suspiciously low, about a billion years. Does this mean we're in the final stages of star formation ? This is still about 10% or so of the lifetime of the Universe so it's never seemed all that suspicious to me.

The problem is that this depletion time has remained low at all redshifts. It's not that galaxies are suspiciously close to the end, it's that they should have already stopped forming stars and run out of gas long ago. Star formation can be estimated in different ways with no real constraint on distance, though gas content is a bit harder – we can't do neutral hydrogen in the distant Universe, but we can absolutely do molecular and ionised gas. Despite the many caveats of detail there's a very strong consensus that galaxies simply must be refuelling from somewhere.

One of those models has been the so-called galactic fountain. Galaxies expel gas due to stellar winds and supernovae, some of which escapes but most of which falls back to the disc. Now this is obvious as to how it explains why star formation keeps going in individual, local parts of the disc where the depletion time is too short, but how this explains the galaxy overall has never been clear to me. What might be going on is that the cold clouds of ejected gas (which look like writhing tendrils in the simulations) act as condensation sites as they move through the hot corona and fall back. Here gas in the hot, low density corona of the galaxy can cool, with the simulations saying that this mass of gas can be very significant. So the galaxy tops up its fuel tank from its own wider reservoir. It will of course eventually run out completely, but not anytime soon.

This is a compelling idea but there are two major difficulties, one theoretical and one observational. The theoretical problem is that the details of simulations really matter, especially resolution. If this is too low, clouds might appear to last much longer than they do in reality. One speaker presented simulations showing that this mechanism worked very well indeed while another showed that actually the clouds should tend to evaporate before they ever make it back to the disc, so this wouldn't be a viable mechanism at all. On the other hand, neither used a realistic corona : if it's actually not the smooth and homogenous structure they assume it to be, this could totally change the results.

The observational difficulty is that these cold gas clouds are just not seen anywhere. This is harder to explain but may depend on the very detailed atomic physics : maybe the clouds are actually warmer and more ionised than the predictions, or maybe colder and molecular. Certainly we know there can be molecular gas which is very hard to detect because it doesn't contain any of the tracer molecules we usually use; H2 is hard to detect directly so we usually use something like CO. 


And with that, I really end my summaries of EAS 2024, and return to regular science.

Thursday, 11 July 2024

ChatGPT Is Not A Source Extractor

When ChatGPT-4o came along I was pretty keen to try out its shiny new features, especially since some of the shine has rubbed off the chatbots of late. Oh, ignore the hype trains completely : those who are saying it's going to cause the apocalypse or usher in the Utopian end of history are equally deluded. I'm talking about actual use cases for LLMs. This situation remains pretty much as it has been since they were first unleashed. That is...
  • Decent enough if you want free-form discussions (especially if you need new ideas and don't care too much about factual accuracy)
  • Genuinely actually very useful indeed for coding (brilliant at doing boiler-plate work, a serious time-saver !)
  • Largely crap if you need facts, and even worse if you need those facts to be reliably accurate
Pretending that those first two are unimportant is in my view quite silly, legitimate concerns about energy expenditure notwithstanding. But that third one... nothing much seems to have shifted on that at all. They're still plagued with frequent hallucinations, since they're not grounded in anything so that they have no internal distinction between a verifiable, observable truth and the CPU-equivalent of a random brain-fart firing of the neurons.

Unfortunately GPT-4o just seems to extend this into a multi-modal world, giving results basically consistent with my earlier tests of chatbots. But I was intrigued by its apparent accuracy when supplying image files. It seemed to be, albeit from limited testing, noticeably more accurate when asked questions about image files than, say, PDFs. So I had a passing thought : could I use ChatGPT-4o to find sources in my data ?

Spoiler : no. It doesn't work.

It's not possible to share the chat itself because it contains images, but basically what I did was this. I uploaded an image of a typical data set I would customarily trawl look looking for galaxies. The very short version is that the HI detections of galaxies typically look like elongated blobs, sometimes appearing saturated and sometimes as mere enhancements in the noise. You can find a much more thorough explanation on my website, but that's the absolute basics. For example, in the image below, there are seven very obvious detections and one which is a bit fainter. 

I began by giving ChatGPT a detailed description of the image and the task at hand. This is the kind of thing that takes a few minutes to explain to a new observer; the actual training of data inspection can take a few days, but the explanations need be only very short indeed. And finding the bright sources is trivial : almost anyone can do that almost immediately. The bright galaxies are inherently obvious in the data when presented like this. Even if you have no idea what the axes labels refer to, it's clear that some parts of the image are very different to the others.

I asked ChatGPT to mark the location of the sources or otherwise describe their position. It didn't mark them but instead gave descriptions. Its world coordinates weren't precise enough to verify what it had identified, however, being limited to only values directly readable in the image and not doing any interpolation. I also gave it a broad alpha-numeric grid (A-J along the x-axis and 1-6 along the y-axis), but this was too coarse to properly confirm what it thought it had found. 

Its results were ambiguous at best. Even with this coarse grid it was clear some of its results were simply wrong. So I did what I'd do with new observers. I marked the sources with red outlines and numbers, uploaded the new image and described what I'd done, so it would have some kind of reference image. I also described the sources in more detail, e.g. which ones were bright and which were faint, and whether they extended into adjacent cells.

Next I gave it a new image with a finer grid (A-O and 1-14). This time, two sources (out of the ten or so visible) were reported correctly while the rest were wrong.  By mistake, I missed out the "D" cell in the coordinate labels, but ChatGPT reported a source at D4 ! Its revised claims were still wrong though, with once again getting only two correct.

This wasn't going well. I decided to dial it back and try something simpler. Maybe ChatGPT was able to "see" the features but not was accurately reading the coordinates, or perhaps hallucinating its answers and so mangling its results. So now I uploaded an image devoid of any coordinates and asked it for a simple count of the number of bright blobs. It got the answer right ! Okay, better... I asked it if it could mark the locations directly on the image, but it said it couldn't edit images. Instead it suggested giving the coordinates of the sources as a percentage of the axis length from the top left. Fair enough, but when comparing its reported coordinates it had again two near-misses and got all the rest simply wrong.

Finally I decided to check if at least the reported number count wasn't just a fluke. I uploaded three images in one file (thus circumventing OpenAI's painfully-limited restrictions on the free plan), each labelled with a number, and asked for the number of sources in each. It got one right and the rest wrong. It also gave descriptions of where it thought the sources were (i.e. upper left, middle, that sort of thing) and these were all wrong. Then, rather surprisingly and quite unprompted, it decided that it actually could edit images to mark the positions after all. The result came back :


Well... it's less than stellar. 

The upshot is that nothing much has changed about chatbot use cases at all. Good for discussions,  useless for facts. Whether it is "seeing" the images in some sense I don't know : possibly at some level it does recognise the sources but hallucinates both when trying to mark them and describe their positions, or possibly it's just making stuff up and nothing else. The latter seems rather unlikely though. Too often in other tests it was capable of giving results from figures in PDFs and image files which could not have been obtained from reading any of the text, that required actually "looking" at the images. 

Regardless of what it's actually doing, in terms of using ChatGPT as a source extractor, it's a non-starter. It doesn't matter why it gets things wrong, for practical application it only matters that it does. Maybe there's something capable under there, maybe there isn't. For now it's just an energy-intensive way of getting the wrong answers. Well, I could have done that anyway !

Monday, 8 July 2024

EAS 2024 : The Highlights

For the last major conference I went to, I combined the science and travel reports together. This was easy because Cardiff to me is not an exotic destination so it hardly needs much description. But this year's EAS conference was in Padova, and my explorations of the city itself and neighbouring Venice easily required their own dedicated post over on Physicists of the Caribbean. That, however, is a sideshow. Here I should say something about the main reason I was there, i.e. the SCIENCE !

This will only be brief. I have a number of things I want to look up in detail, but for now, here's what I leaned from the conference itself.


Euclid Is Mega Awesome. Like, seriously awesome. I have this chronic bad habit of not paying attention to new telescopes when they're being proposed or even during construction : who know when they'll launch or whether they'll reach their design spec ? Worst of all, and perhaps a better reason, is that their marketing is often terrible. It's all public outreach about their main target mission. For example as far as I knew Euclid was some specialist thing for probing dark energy presumably by measuring something very specific and niche, which just goes to show how little attention I ever give new instruments.

Actually it's more like Hubble on steroids. It's got comparable resolution (though not quite Hubble level) but with a vastly larger field of view and exquisite optics that gives a uniformly high quality image across the whole field. This makes it a fantastic, game-changing instrument for the low surface brightness universe. Want to find faint stellar streams, tiny dwarf galaxies and other exotic phenomena ? Euclid to the rescue ! If they'd sold it as more of a survey instrument and not this dark energy thing... well, quite possibly they did and I just wasn't listening. 

Whoops ! Luckily for me, its main survey will be almost all-sky and openly available, so I won't have to do anything except look at the data when it's available.


Cosmology Isn't Dead Yet. There are lots of press releases about the discovery of disc-like and massive galaxies in the early Universe, only a few hundred million years after the Big Bang, which is supposedly not long enough for them to have formed. The picture according to the experts is a bit more subtle than this popular description though. Some of these objects might be a problem - they really might, and not in a "but probably not" way : there's every chance there's something going on here that we don't understand. Whether that's the cosmology itself, i.e. the structure and nature of the universe, or just the detailed physics of the gas and star formation... that's where things get more suspect.

For example, it's not, it turns out, that simulations don't predict such objects. They do. It's just that it appears that they predict fewer than the numbers observed. In the Illustris simulations apparently about 10% of the relevant comparison sample are discy in the early simulated Universe, which isn't a lot but isn't insignificant either. The problem is that it's unclear if this is in conflict with the observations because the numbers are still small, and we aren't sure about the observational biases. For mass the situation is much worse : when deriving the mass using synthetic observational procedures (that is, transforming the simulations into the kind of data observers would process, and then using their methods to guestimate the stellar mass), results vary by up to three orders of magnitude away from the true value. So claims that there are too many massive galaxies in the early Universe can be safely ignored.

Except, there's a major caveat. Mass is a derived parameter with many uncertainties, but straightforward luminosity (i.e. brightness) is a direct observable. And here there does appear to be a conflict (sorry, tension) with observations.  There are potential solutions here as well though. For example a more "bursty" mode of star formation, rather than the more continuous process generally assumed, as well as accounting for nebular continuum emission and a top-heavy IMF (that is, forming more big stars in proportion to small ones than in the nearby universe, which could happen because their chemistry would be completely different), might be enough to solve all this. As some wise old sage commented, complicated problems often have complex, multi-parameter solutions rather than one big single "change this and all will be well" moment. Which unfortunately means that figuring this out is going to take time.

EDIT : I almost forgot. Back in 2017 there was a paper claiming that galaxies in the early Universe show declining rotation curves, indicating they were far less dark matter dominated than today. I was skeptical of this, as reported here with some follow-up here. And from conference results it seems that these results are heavily dependent on observation time : too short and indeed the result is declining curves, but observed for longer than they flatten considerable. Exactly why this should be I don't know, but it appears the original authors rowed back on their initial claims somewhat in a 2020 paper. It seems that the original results were something of an oversimplification, if not just simply wrong.


There Are No Dark Galaxies. New HI surveys are reaching lower column (or surface) densities, that is, how much gas they detect per unit area, than I was aware. In fact they're comparable to Arecibo but with the added benefit of much higher resolution. The penalty is that this takes enormous amounts of observing time but this is largely compensated for by large fields of view.

The results so far I think are still mainly "watch this space" except for individual objects. But one interesting finding is that there's a distinct lack of optically dark galaxies which have gas but no stars. This is something I've been working on for many years with Arecibo data, but MHONGOOSE's much improved resolution combined with its sensitivity means we would already have expected to see something if they exist in numbers of any significance. Thousands of detections of normal galaxies already (over 6,000 in fact - compare this to the 31,000 from ALFALFA which took many years to achieve !) but no hint of anything dark that isn't explicable by another mechanism... though of course, this has some caveats, but a significant population there appears none.


Gas Accretion Definitely Happens. But we still don't know when or how ! Obviously galaxies acquire gas at some point or they'd never form any stars, but while there's much indirect evidence for this, direct observations remain hugely unconvincing. Mergers have been ruled out as a significant source of the gas, there just aren't enough objects to do this. Cold accretion, where cold atomic HI falls into galaxies along streams, remains a distinct possibility but unobserved, even with the newest and most sensitive instruments.

Hot accretion (from the hot gas large-scale cosmic web) also remains a possibility but while large-scale bridges of X-ray gas have been detected, everyone was much more circumspect about claiming these as detections of the web itself than they have been in the past. And quite properly too, because any individual detection can always be challenged as an interaction. To claim a detection of the web, we'd really need to see it ubiquitously, with multiple strands connecting multiple clusters. Interestingly, HI intensity mapping has thus far got a statistical detection, but not to the point of being able to do actual imaging yet.


Radio Halos Do Funny Things. Not only does the X-ray gas trace giant, megaparsec-scale structures linking entire galaxy clusters, but so does the much lower-energy radio emission. There are different kinds of radio halos, with Mpc-scale giants to ~100 kpc scale "minihalos". And these aren't the same component, with the density profiles of the two showing a distinct change of slope. Then there are radio relics, the result apparently of shocks in the intracluster medium producing giant arcs of radio emission.

In one particularly noteworthy case, one of these giant radio lobes appears to be interacting with a galaxy. This shows a tail which is distinctly different from most. Where galaxies lose gas by ram pressure, the tails tend to be well-collimated and decrease in brightness at greater distances. This one instead gets both wider and brighter, indicating a different physical processes is at work. Like classical ram pressure, however, it seems to have caused an initial increase in the star formation rate of the galaxy. This is interesting to me because I've always assumed these features were of too low a density level to have an impact on anything, being an interesting way to trace the dynamics of an environment but being themselves no more than tracers, not interactors.


Ultra Diffuse Galaxies Are Still A Thing. There seems now to be a quite firm consensus : UDGs are two populations, one of "puffy" dwarves of low total mass that have become somehow extended, and the other of "failed", much more massive galaxies not predicted by any simulations. Several people made that last point and nobody raised any arguments, though exactly how massive they are (and how numerous this population is) wasn't clearly stated. Even so, to my mind if you want a serious challenge to cosmology, forget the attention-hogging results from JWST and look at these much nearer objects !

Likewise, the view seems to be that UDGs are indeed (following results from a few years ago) not actually all that large - but they are flatter : their light profiles are basically constant and then suddenly truncated at their edges, whereas normal galaxies have more complicated and varied profiles. There still seems to be some disagreement, however, as to whether UDGs therefore represent extreme examples of regular dwarf galaxies or are genuine outliers which are qualitatively different from the main galactic population. 

The dynamics of the more massive objects to me suggests the latter, but there's still not a clear answer to this. Their globular cluster populations are also highly diverse, with some having none at all and others having far more than expected. One very interesting set of observations by Pierre-Alain Duc shows stellar tails from globular clusters in the inner regions of the UDGs, likely merging with the main body of the galaxy - that's how good our observations have become ! Considering their whole set of properties, it remains understandably difficult to decide what the hell UDGs actually are.

To me the most interesting individual object presented in the whole conference was a UDG by Pavel Mancera Piña, he of the "UDGs have no dark matter" fame. Regular readers will know I was at first enthusiastic about this result, then became a bit more skeptical, but finally I've settled (?) back to my original stance : the results seem secure enough that they can't be attributed to observational errors or improper corrections. Now Pavel has found an object which is especially weird. Like the others, it's isolated. Its HI map shows very neatly circular contours, and its kinematics are consistent with no dark matter at all. The only way it can have a dark halo is if the concentration is very low - much lower than standard cold dark matter predicts, but explicable with self-interacting dark matter... 

And because it's isolated, explaining it with modified gravity is hard. If gravity rather than matter governs dynamics, then all isolated objects of the same mass and radius should show the same kinematic properties. That they don't might well argue that such notions are simply wrong, even as objects like this one indicate that the standard model of dark matter itself has flaws.


Well, of course much remains to be done in all those categories, especially the last. My own talk I'm pleased to say went down very well, I got lots of questions about the AGES dark clouds and had some nice discussions afterwards. My poster (should be accessible for a few months, I think) may have sunk without trace, but no matter. And the conference slogan "Where Astronomers Meet" may be the most blandest thing that ever did bland, but the sessions themselves were full of interesting stuff.

I'm old enough to remember conferences of a different era of more, shall we say, "robust conversations". I've not seen any actual arguments (except in some much smaller events) in many years now. Disagreements still arise but they're altogether gentler. Sometimes I miss the spectacle of hearing a good row from a safe distance, but perhaps, as long as we still do interesting science and still have fruitful discussions, this new way is better.

Thursday, 27 June 2024

The galaxies that quenched backwards

Today's paper is about contrarian galaxies that don't play by the rules. Most galaxies, left to their own devices, build up a big bulge of stars in the middle as the gas density is initially highest there. But this high star formation rate burns through its fuel very quickly and soon pitters out, "quenching" the star formation in the centre. Meanwhile the disc happily ambles along forming stars at a stately, steady pace, unhurried but longer-lasting.

There are some, though, which appear to do just the opposite. In clusters this is easy. Ram pressure preferentially removes the outer gas first, leaving the most stubborn gas remaining in the centre to carry on forming stars while the disc slowly reddens and dies. What's weirder is that there appear to be some galaxies like this in environments where ram pressure can't be playing any role. Ram pressure requires a hot intergalactic medium, which is pretty much only found at any significant levels in massive clusters. Everywhere else it should be far too weak to cause this kind of damage*.

* It probably doesn't have zero role to play outside clusters though. Very small satellite dwarfs can experience significant ram pressure from the hot gas of their parent galaxies, and there's some evidence that larger galaxies can experience at least a little gas-loss due to ram pressure in large-scale filaments. But probably not anywhere enough to account for galaxies like this. What they more likely experience is only starvation, where their outermost, thinnest gas is removed but nothing from within their denser discs. This means they can't replenish their gas as it gets eaten up by star formation. 

This paper is about a particular sort of these kind of galaxies, which they give the ugly name of "BreakBRDs" : Break Bulge Red Discs. The "break" refers to a particular spectral line indicating that there was star formation very recently in the bulges. "Blue Bulges Red Discs" might have been easier, but technically the bulges aren't actually blue, so this wouldn't be right. Even so, BBRDS would be a better acronym. Or heck, I'll just call 'em backwards galaxies.

I have to say I both like and dislike this paper. On the one hand it's very careful, thorough, and doesn't draw any overblown conclusions from the limited data. On the other, some of the discussion is long-winded, non-committal, and rather tedious considering that in the end the conclusion is so indecisive. I think there's some really great discussion on each individual scenario proposed to explain the backwards galaxies, but the collective whole becomes at times very confusing. It feels a bit like a paper written by committee. The main problem is nothing to do with the science at all, but the structure : there isn't a good unifying framework to tie all the different scenarios together. 

Here I shall try and simplify and disentangle things a bit. 

They begin with a nice overview of the observational difficulties of establishing what's going on. Some results support this classical picture of outside-in quenching where ram pressure (or other environmental effects) strip the outer gas first. But others find that this happens more in low density environments, exactly the opposite of what's expected ! Still others show something entirely different, where quenching happens irrespective of position in the galaxy, i.e. the whole disc quenches everywhere, all at once. Even simulations aren't much help, with galaxies similar to their BBRDs being found but with no clear mechanism responsible for what happened.

The particularly interesting thing about BBRDs, i.e. backwards quenchers, is that they exist over a wide range of stellar masses and apparently in all environments. Unfortunately they don't say anything else about the environments(s) of their particular sample, which is my only scientific quibble with the paper. Anyway what they do here is look at a sample of about a hundred or so which have HI gas measurements, using a combination of existing and their own new observations.

Their main result is actually quite simple, but you need to be aware of the colour-magnitude diagram first. Very simply, there are :

  • Galaxies which are blue, have lots of gas, and are forming stars as per usual (the so-called star formation main sequence, or more simply the blue cloud).
  • Galaxies which are red, don't have any gas, and aren't forming any stars : the red sequence.
  • Galaxies which are intermediate between red and blue, have a bit of gas, and are forming stars more slowly : the green valley (or transition region).
Of course, regular readers here will know that it isn't as simple as that, but this'll do for what's needed here.

They show that BBRDs and GV (green valley) galaxies have about the same gas fractions regardless of their stellar mass, both considerably lower than those in the blue cloud. But given their star formation rates, BBRDs would burn through their current gas content very much faster than GV galaxies. That is, for the same amount of gas, BBRDs are consuming their gas far more efficiently than GV galaxies. They're forming stars at rates typical of blue cloud galaxies, even though they've got way less gas. GV galaxies, by contrast, are forming stars even slowly than those on the main sequence.

How do they do this ? Or, conversely, what suppresses the star formation in GV galaxies ?

This is not at all easy to answer. For the GV galaxies, their gas depletion times seem to be low because of their extremely low star formation rates, which more than compensates for their lower gas contents. For the BBRDs, their rapid gas depletion times are easier to explain, as their star formation rate remains normal despite their low gas contents.

But what's behind it all ? There are three main options.


1) The final stages of quenching

These BBRD galaxies could be experiencing the end stages of gas removal due to some external mechanism, as is usually the case in clusters. Perturbations to the gas could drive inflows into the centres of the galaxies, resulting in an increase in gas density and thus an increase in star formation rate. Unfortunately they don't have the detailed maps of the gas needed to say if this is the case or not, but here it would have been useful for them to say something more about environment.


2) Accretion

It could be that these galaxies were in fact already fully quenched and are experiencing something of a renaissance. Accretion is a somewhat controversial topic because it's hard to ever determine that it's really happening with any certainty, but clearly galaxies have got to get their gas from somewhere. One possibility is so-called cold accretion from the cosmic web, where the hot, diffuse extragalactic background cools and condenses, falling into the potential wells of galaxies along streams.

Reading between the lines I think this is their favoured scenario. There are several reasons to think the gas isn't doing what it's supposed to be doing. For a few galaxies they have well-resolved kinematics of the gas and stars, and here it tends to be either misaligned with and/or highly distorted compared to the stars. They've only got such data for the innermost regions, but the spectral profiles of the gas (measured over much larger scales) also suggest it's significantly asymmetric. And they're also systematically offset form the Tully-Fisher relation : that is, the gas kinematics has broader line widths that typical galaxies of the same mass. 

All this is what you'd expect if the gas had only recently arrived rather than evolving along with the galaxy throughout its history. What they can't say is whether this indicates a temporary revival of the galaxy or a full resuscitation. They might, potentially, be rejuvenating themselves back onto the star formation main sequence, or they may have just accumulated a little bit of gas and will soon burn out once again. My guess is that latter is more likely but this would mean galaxies would have extraordinarily complicated histories.


3) Mergers

The most dramatic way a galaxy can accumulate more gas is by gobbling up other galaxies whole. This is their least favoured scenario as there are no signs of the remnants of the encounter you'd expect, e.g. long stellar tails. They can't rule it out though, as it's possible the star formation is persisting long after the tails have dispersed, but resolved gas measurements could help verify this idea.


I've simplified the discussion here considerably but this is roughly what it boils down to; they themselves, in my opinion, are too focused on the details of the different processes rather than painting a clear picture of the main differences. Fortunately, they've got VLA time to resolve the gas in a pilot sample, so this looks like a solvable problem. As they say, "it is important to remember the results of McKay et al. (in preparation)"... well, asking to remember results that don't exist yet is a new one on me, but I'll try.

Friday, 21 June 2024

The ultimate in flattening the curve

It just refuses to go down...

Well, I'd play the innuendo card with this paper, at any rate. 

Galaxy rotation curves are typically described as flat, meaning that as you go further away from their centres, the orbital speeds of the gas and stars don't change. This is the traditional evidence for dark matter. You need something more than the visible matter to accelerate material up to these speeds, especially the further away you go : if you use the observed matter, the prediction is that orbital speeds should steadily decrease a la Kepler. Unseen dark matter is a neat way to resolve this dilemma, along with a host of other observational oddities.

This paper claims to have extended rotation curves considerably further than traditional measurements and find that the damn curves remain flat no matter how far they go. They reach about 1 Mpc, about the size of our whole Local Group, and still don't show any sign of a drop. This is not at all expected, because eventually the curve should drop as you go beyond the bulk of the dark mass. It is, they say, much more in-keeping with the prediction of modified gravity theories that do away with dark matter, such as MOND.

I won't pretend I'm in any way expert in their methodology, however. A standard rotation curve directly measures the line of sight speed of gas and/or stars, which is relatively simple to convert into an orbital speed – and for qualitatively determining the shape of the curve, the corrections used hardly matter at all. But here the authors don't use such direct kinematic measurements, but instead use weak gravitational lensing. By looking at small distortions of background galaxies, the amount of gravity associated with a target foreground source can be determined. Unlike strong lensing, where distortions are easily and directly visible in individual sources, this is inferred through statistics of many small sources rather than from singular measurements.

Here they go even more statistical. Much more statistical, in fact. Rather than looking at individual lens galaxies they consider many thousands, dividing their sample into four mass bins and also by morphology (late-type spiral discs and early-type ellipticals). The lensing measurements don't give you orbital speed directly, but acceleration, which they then convert into velocity. 

1 Mpc is really quite a long way in galactic terms, and it wouldn't be at all uncommon to find another similar-sized galaxy within such a distance : in our Local Group, which is not atypical, there are three large spiral galaxies. Measuring the rotation curve out to such distances then becomes intrinsically complicated (even if you had a direct observational tracer like the gas) because it's hard to know which source is contributing to it. 

They say their sample is of isolated galaxies with any neighbours being of stellar mass less than 10% of their targets out to 4 Mpc away, but their isolation criterion uses photometric redshifts*. Here I feel on very much firmer footing in claiming that these are notoriously unreliable. Especially as the "typical" redshift of their lens galaxies is just 0.2, far too low for photometric measurements to be able to tell you very much. Their large sample means they understandably don't show any images, but it would have been nice if they'd said something about a cursory visual inspection, something to give at least some confidence in the isolation.

* These are measurements of the redshift based on the colour of the galaxy, which is extremely inexact. The gold standard are spectroscopic measurements, which can give precisions of a few km/s or even less.

If we take their results as given, they find that the rotation curves of all galaxies in all mass bins remain flat out to 1 Mpc, the limit of their measurement (although in one particular subset this doesn't look so convincing). They also show that in individual cases where they apparently can get good results from weak lensing, the results compare favourably with the direct kinematics they get from gas data.

As often with results questioning the dark matter paradigm, I'd have to describe the results as "intriguing but overstated". I don't know anywhere near enough about the core method of weak lensing to comment on the main point of the paper. But that this is normally in itself as result of statistical inference, and that here they use a very large sample of galaxies and convert the result from the native acceleration measurement to velocity, and that their isolation criteria seems suspect... I remain unconvinced. I'd need a lot more persuading that the weak lensing data is really giving meaningful results to such large distances from the lens galaxies.

What would have been nice to see is the results from simulations. If they could show that their photometric redshifts were accurate enough in simulated cases to give reliable results, and that the weak lensing should given something similar to this (or not, if the dark haloes in the simulations have a finite extent), then I'd find it all a lot more convincing. As it stands, I don't believe it. Especially given that so many galaxies are now known with significantly lower dark matter contents than expected : these "indefinitely flat" rotation curves seem at odds with galaxies with such low rotation speeds even in their innermost regions. Something very fishy's going on.

Wednesday, 19 June 2024

The shoe's on the other foot

My, how the tables have turned. The hunter has become the hunted. And various other cliché's indicating that the normal state of affairs have become reversed.

That is, as well as having to write an observing proposal, I find myself for the first time having to review them. Oh, I've reviewed papers before, but never observing proposals. This came about because ALMA has a distributed proposal review system : everyone who submits their own proposal has to review ten others. And since this year I finally submitted one, I get to experience this process first hand.


The ALMA DPR procedure

When you submit a proposal, you indicate your areas of expertise and any conflicts of interests  – collaborators and direct competitors who shouldn't be reviewing your proposal, either because they'd stand to benefit from it being accepted or would love to take you down a peg. It's a double-blind procedure : your proposal can't contain any identifying information and you don't know who the reviewers are. Some automatic checks are also carried out to prevent Co-Is on recent ALMA proposals being assigned as reviewers, and suchlike.

Then your proposal is sent off for initial checks and distributed to ten other would-be observers who also submitted observing proposals in the current cycle. You, in turn, get ten proposals to review yourself. Each document is four pages of science justification (of which normally one or even two pages are taken up with figures, references, and tables) plus an unlimited-length technical section containing the observing parameters for each source plus some brief justification on the specifics (in practise, in most proposals each of these so-called "science goals" are very similar, using the same observing setup on multiple targets). You then write a short review of each one, of a maximum of 4,000 characters but typically more like ~1,000 (or even less) describing both the strengths and weaknesses of each. You also rank them all relative to each other, from 1 (the strongest) to 10 (the weakest).

That's stage one. A few weeks later, in stage two you get to see everyone else's reviews for the same proposals, and can then change your own reviews and/or rankings accordingly, if you want to. So far as I know, each reviewer gets a unique group of ten proposals to review, so no two reviewers review the same set of proposals, meaning you can't see the others rankings. Exactly how their rankings are then all compared and combined, and ultimately, translated into awarded telescope time, remains a mystery to me. Those details I leave for some other time, and I won't go into the details of anonymity* here either : I seem to recall hearing that this gives a better balance of both experience and gender, but I don't have anything to hand to back this up.

* I will of course continue to respect the anonymity requirements here, and not give any information that could possibly identify me as anyone's reviewer.

Instead I want to give some more general reflections on the process. To be honest I went into this feeling rather biased, having received too many referee comments which were just objectively bollocks. I was quite prepared to believe the whole thing would be essentially little better than random, which is not a position without merit.


First thoughts

And my initial impressions justified this. It seemed clear to me that everyone had chosen interesting targets and would definitely be able to get something interesting our of their data, making this review process a complete waste of time.

But after I let things sink in a bit more, after I read the proposals a bit more carefully and made some notes, I realised this wasn't really the case. I still stand by (with one exception) that all proposals would result in good science, but the more I thought about it, the more I came to the conclusion that I could make a meaningful judgement on each one. I tried not to judge too much whether one would do better science than another, because who am I to say what's better science ? Why should I determine if studies of extrasolar planets are more important than active galactic nuclei ?

These aren't real examples, but you get the idea. Actually the proposals were all aligned very much more closely with my area of expertise. The length of four pages I would say is "about right", it gave enough background for me to set each proposal in its proper context as well as going into the specific objectives.

Instead, what I tried to assess was whether each project would actually be able to accomplish the science it was trying to do. I looked at how impactful this would be only as a secondary consideration. There isn't really any right or wrong answer as to whether it's better to look at a single unique target versus a statistical study of more typical objects, but I tried to judge how much impact the observations would likely have on the field, how much legacy value they would have for the community. But first and foremost, I considered whether I was persuaded the stated science objectives could actually be carried out if the observations themselves reached their design spec.


Judgement Day

And this I found was something I could definitely judge. Two proposals to me stood out as exemplary, perfectly stating exactly what they wanted to do and why, exactly what they'd be able to achieve with this. It was very clear that they understood the scientific background as well as anyone did. I initially ranked these essentially as a coin-toss as to who got first and who got second place; I couldn't meaningfully choose between them.

At the opposite extreme were two or three which didn't convince me at all. One of the principle objectives of one of them was just not feasible with the data they were trying to obtain, and they themselves presented better data in their proposal that they already had which would have been much more suitable for this. Lacking self-consistency is a black mark in just about any school of thought. Another looked like it would observe a perfectly good set of objects, but contained so many rudimentary scientific errors that there was no way I could believe they'd do what they said they would do. 

Again, deciding which one to rank lowest was essentially random, though I confess that one of them just wound me up the wrong way more than the other.

In the middle were a very mixed bunch indeed. Some had outstanding ideas for scientific discovery but were very badly-expressed, saying the same thing over and over again to the nth degree (I would say to these people, there's no obligation to use the full four pages, and we should stipulate this in the guidance to observers and reviewers alike. I tried to ignore the poor writing style of these and rank them highly because of the science). Some oversold the importance of what they'd do, making unwarranted extrapolations from their observations to much more general conclusions. Some had a basically good sample but claimed it was something which it clearly wasn't; others clearly stated what their sample was but the objects themselves were not properly representative of what they were trying to achieve.

This middle group... honestly here, a random lottery would work well. On the other hand, there doesn't seem any obvious reason not to use human judgement here either, because for me at least this felt like a random decision anyway. And if other people's judgements are similar then clearly there are non-random effects which probably should actually be accounted for, whereas if they are truly random then the effects will average out. So there's potentially a benefit in one case and no harm in the other, and in any case there almost certainly is a large degree of randomness at work anyway.


Reviewing the reviews

I went through a similar process of revising my expectations in stage 2, though to a lesser degree. At first glance I didn't think I'd need to change my reviews or rankings, but on carefully checking one of the other reviews, I realised this was not the case. One reviewer out of the ten had managed to spot a deeply problematic technical issue in one of the proposals that I otherwise would have ranked very highly. And on checking I was forced to conclude that they were correct and had to downgrade my ranking significantly. This alone makes the process worth doing : 1 out of 10 is not high, but with ~1,600 proposals in total, this is potentially a significant number overall.

Reading the other reviews turned out to be more interesting than I expected. While some did raise exactly the same issues with some of the proposals that I had mentioned, many didn't. Some said "no weaknesses" to proposals I thought were full of holes. One even said words to the effect that "no-one should doubt the observers will do good science with this", a statement I felt presumptuous, biased, and bordering on an argument-from-authority : it's for us the reviewers to decide this independently; being told what we should think is surely missing the point. 

The reverse of this is that some proposals I though were strong others thought were weak – very weak, in some cases. Everyone picks up on different things they think are important. There was one strange tendency for reviewers to point out that the ALMA data wouldn't be of the same quality as comparison data. This is fine, except that the ALMA data would usually have been of better quality, and downgrading it to the same standard is trivial ! I sort of wished I'd edited my reviews to point this out. Some also made comments on statistics and uncertainties that I thought were so generic as to be unfair, yes of course things might be different from expectations, but that's why we need to do observations !

What the DPR doesn't really do is give any chance for discussion. You can read the other reviews but you can't interact with the other reviewers. It might have been nice to have somewhere where we could enter a "comment to other reviewers", directed to the group, or at least have some form of alert system when reviews were altered. Being able to ask the observers questions might have been nice, but I do understand the need to keep things timely as well. On that front, reviews varied considerably in length; mine were on the longer and to be honest perhaps overly-long side (I think my longest was nearly 2,000 characters), while one was consistently and ludicrously short.

All this has given me very mixed feelings about my own proposal. On the one hand, I don't think it's anywhere near the worst, and I stand behind the scientific objectives. On the other, I think I concentrated overmuch on the science and not enough on the observational details. Ranking it myself with hindsight I'd probably have to put it in the lower third. It was always a long shot though, so I'll be neither surprised nor disappointed by the presumed rejection. One can but try with these things.

One thing I will applaud very strongly is the instruction to write both strengths and weaknesses of each proposal. All of them, bar none, had some really good points, but it was helpful to remind myself of this and not get carried away when reviewing the ones I didn't much like. Weaknesses were more of a mixed bag; one can always find something to criticise, although in some cases they aren't significant. Still, I found it very helpful to remember that this wasn't an exercise in pure fault-finding.




How in the world one judges which projects to actually undertake, though... that seems to me like the ultimate test of philosophy of science. Groups of experts of various levels have pronounced disagreements about factual statements; some notice entirely different things from others. There's the issue of not only will the science be significant, but also whether the data can be used in different ways from what's suggested. That to me remains the fundamental problem with the whole system, that one can nearly always expect some interesting results, but predicting what they could be is a fool's game.

Overall, I've found this a positive experience. Reading the full gamut of excellent to poor proposals really gives a clearer idea of what reviewers are looking for, something it's just not possible to get without direct experience. Not for the first time, I wonder a lot about Aumann's Agreement Theorem. If we the reviewers are rational, we ought to be persuaded by each other's arguments. But are we ? This at least could be assessed objectively, with detailed statistics possible on how many reviewers change their ranking when reading other reviews. 

And at the back of my mind is a constant paradoxical tension : a strong feeling that I'm right and others are wrong, coupled with the knowledge that other people are thinking the same thing about me. How do we reconcile this ? For my part, I simply can't. I formulate my judgement and let everyone else to the same, and hope to goodness the whole thing averages out to something that's approximately correct. The paradox is that this in no way makes me feel any the less convinced of my own judgments, even knowing that some fraction of them simply must be wrong.

Other aspects are much more tricky. This is a convergence of different efforts, both trying to asses what-is-true (what science claimed is factually correct, why do experienced experts still disagree on some points), what will likely benefit the community the most, and how we try and account for the inevitably uncertain and unpredictable findings. As I've said before many times, real, coal-face research is extremely messy. If it isn't already, then I would hope that telescope proposals ought to be an incredibly active field of research for philosophers of science.

Thursday, 6 June 2024

The data won't learn from itself

Today I want to briefly mention a couple of papers about AI in astronomy research. These tackle very different questions from the usual sort, which might examine how good LLMs can be at summarising documents or reading figures and the like. These, especially the second, are much more philosophical than that.

The first uses an LLM to construct a knowledge graph for astronomy, attempting to link different concepts together. The idea is to show how, at a very high level, astronomical thinking has shifted over time : what concepts were typically connected and how this has changed. Using distributional semantics, where the meanings of words in relation to other words are encoded as numerical vectors, they construct a very pretty diagram showing how different astronomical topics relate to each other. And it certainly does look very nice – you can even play with it online

It's quite fun to see how different concepts like galaxy and stellar physics relate to each other, how connected they are and how closely (or at least it would be if the damn thing would load faster). It's also interesting to see how different techniques have become more widely-used over time, with machine learning having soared in popularity in the last ten years. But exactly what the point of this is I'm not sure. It's nice to be able to visualise these things for the sake of aesthetics, but does this offer anything truly new ? I get the feeling it's like Hubble's Tuning Fork : nice to show, but nobody actually does anything with it because the graphical version doesn't offer anything that couldn't be conveyed with text.

Perhaps I'm wrong. I'd be more interested to see if such an approach could indicate which fields have benefited from methods that other fields aren't currently using, or more generally, to highlight possible multi-disciplinary approaches that have been thus far overlooked.


The second paper is far more provocative and interesting. It asks, quite bluntly, whether machine learning is a good thing for the natural sciences : this is very general, though astronomy seems to be the main focus. 

They begin by noting that machine learning is good for performance, not understanding. I agree, but once we do understand, then surely performance improvements are what we're after. Machine learning is good for quantification, not qualitative understanding and certainly not for proposing new concepts (LLMs might, and I stress might, be able to help with this). But it's a rather strange thing to examine, and possibly a bit of a straw man, since I've never heard of anyone thinking that ML could do this. And they admit that ML can be obviously beneficial in certain kinds of numerical problems, but this is still a bit strange : what, if any, qualitative problems is ML supposed to ever help with ?

Not that quantitative and qualitative are entirely separable. Sometimes once you obtain a number you can robustly exclude or confirm a particular model, so in that sense the qualitative requires the quantitative. But, as they rightly point out, as I have myself many times, interpretation is a human thing : machines know numbers but nothing else. More interestingly they note :  

The things we care about are almost never directly observable... In physics, for example, not only do the data exist, but so do forces, energies, momenta, charges, spacetime, wave functions, virtual particles, and much more. These entities are judged to exist in part because they are involved in the latent structure of the successful theories; almost none of them are direct observables. 

Well, this is something I've explored a lot on Decoherency (just go there and search for "triangles"). But I have to ask, what is the difference between an observation and a measurement ? For example we can see the effects of electrical charge by measuring, say, the deflection of a hair in the static field of a balloon, but we don't observe charge directly. But we also don't observe radio waves directly, yet we don't think they're less real than optical photons, which we do. Likewise some animals do appear to be able to sense charge and magnetic fields directly. In what sense, then, are these "real" and what sense are they just convenient labels we apply ?

I don't know. The extreme answer is that all we have are perceptions, i.e. labels, and no access to anything "real" at all, but this remains (in some ways) deeply unsatisfactory; again, see innumerable Decoherency posts on this, search for "neutral monism". Perhaps here it doesn't matter so much though. The point is that ML cannot extract any sort of qualitative parameters at all, whereas to humans these matter very much – regardless of their "realness" or otherwise. If you only quantify and never qualify, you aren't doing science, you're just constructing a mathematical model of the world : ultimately you might be able to interpolate perfectly but you'd have no extrapalatory power at all.

Tying in with this and perhaps less controversially are their statements regarding why some models are preferred over others :

When the expansion of the Universe was discovered, the discovery was important, but not because it permitted us to predict the values of the redshifts of new galaxies (though it did indeed permit that). The discovery was important because it told us previously unknown things about the age and evolution of the Universe, and it confirmed a prediction of general relativity, which is a theory of the latent structure of space and time. The discovery would not have been seen as important if Hubble and  Humason had instead announced that they had trained a deep multilayer perceptron that could predict the Doppler shifts of held-out extragalactic nebulae.

Yes ! Hubble needed the numbers to formulate an interpretation, but the numbers themselves don't interpret anything. A device or mathematical model capable of predicting the redshifts from other data, without saying why the redshifts take the values that they do, without relating it to any other physical quantities at all, would be mathematical magic, and not really science.

For another example, consider the discovery that the paths of the planets are ellipses, with the Sun at one focus. This discovery led to extremely precise predictions for data. It was critical to this discovery that the data be well explained by the theory. But that was not the primary consideration that made the nascent scientific community prefer the Keplerian model. After all, the Ptolemaic model preceding Kepler made equally accurate predictions of held-out data. Kepler’s model was preferred because it fit in with other ideas being developed at the same time, most notably heliocentrism.

A theory or explanation has to do much more than just explain the data in order to be widely accepted as true. In physics for example, a model — which, as we note, is almost always a model of latent structure — is judged to be good or strongly confirmed not only if it explains observed data. It ought to explain data in multiple domains, and it must connect in natural ways to other theories or principles (such as conservation laws and invariances) that are strongly confirmed themselves.  

General relativity was widely accepted by the community not primarily because it explained anomalous data (although it did explain some); it was adopted because, in addition to explaining (a tiny bit of new) data, it also had good structure, it resolved conceptual paradoxes in the pre-existing theory of gravity, and it was consistent with emerging ideas of field theory and geometry.

Which is a nice summary. Some time ago I'd almost finished a draft of a much longer post based on this this far more detailed paper which considers the same issues, but then blogger lost it all and I haven't gotten around to re-writing the bloody thing. I may yet try. Anyway the need for self-consistency is important, and doesn't throttle new theories in their infancy as you might expect : there are ways to overturn established findings independent of the models. 

The rest of the paper is more-or-less in line with my initial expectations. ML is great, they say, when only quantification is needed : when a correlation is interesting regardless of causation, or when you want to find outliers. So long as the causative factors are well-understood (and sometimes they are !) it can be a powerful tool for rapidly finding trends in the data and points which don't match the rest. 

If the trends are not well-understood ahead of time, it can reinforce biases, in particular confirmation bias by matching what was expected in advance. Similarly, if there are rival explanations possible, ML doesn't help you choose between them if they don't predict anything significantly different. But often, no understanding is necessary. To remove the background variations in a telescope's image it isn't necessary even to know where all the variations come from : it's usually obvious that they are artifacts, and all you need to is the mathematical description of them. Or more colourfully, "You do not have to understand your customers to make plenty of revenue off of them." 

Wise words. Less wise, perhaps only intended as a joke, are the comments about "the unreasonable effectiveness of ML", that it's remarkable that these industrial-grade mathematical processes are any good for situations to which they were never designed. But I never even got around to blogging Wigner's famous "unreasonable effectiveness" essay because it seemed worryingly silly. 

Finally, they note that it might be better if natural sciences were to shift their focus away from theories and more towards the data, and that the degeneracies in the sciences undermine the "realism" of the models. Well, you do you : it's provocative, but on this occasion, I shall allow myself not to be provoked. Shut up and calculate ? Nah. Shut up and contemplate.

Turns out it really was a death ray after all

Well, maybe. Today, not a paper but an engineering report. Eh ? This is obviously not my speciality at all , in any way shape or form. In fa...