Little Physicists: March 2020

Wednesday, 25 March 2020

Slowly does it

And now back to our regular service on galaxy dynamics and whether or not MOND is a thing.

Galaxies typically have rotation curves which are approximately flat in their outer regions. They usually rise steeply in the innermost regions, often going a bit crazy at first, but then settle down to something that's basically flat and boring. Dwarf galaxies are an exception, often showing curves which are still (slowly) rising, possibly because the stars and gas don't probe the full dark matter halo. But even a few larger galaxies also show curves which are weakly rising or declining.

Recently we looked at massive galaxies which are rotating more quickly than expected, and whether these challenge alternative theories of gravity. It seems to me that they probably do, but critics are right to point out that it matters a great deal as to which value you use to define the rotation of a galaxy, i.e. the peak or the flat part of the curve.

This latest paper looks at galaxies with declining rotation curves. They choose their sample so as to avoid the usual causes of this, like interactions or having a strong bar or bulge in the centre (such that the mass of the innermost baryons becomes significant compared to the extended dark matter). They also select them to have well-resolved rotation curves, so it's not a data artifact or anything daft like that. And unlike the previous paper, they show all of their curves, rather than coyly hiding them away like in the last one.

What I cannot for the life of me understand are their quoted ratios of peak to outermost velocity. Looking at their rotation curves, I'd guestimate the ratios to be no more than a factor of two at the very most, but they give values of 10-40 ! I'm tempted to email them because I can't make sense of that, but the text has been translated from the original Russian so that might make things difficult.

When it comes to the inevitable Tully-Fisher relation (a comparison of rotation speed and luminous mass), they show that their galaxies agree with the standard relation if they use the peak velocity, but are significantly slower than the standard relation predicts if they use the flat velocity. This is exactly the opposite of what Milgrom said when criticising the fast rotators : these authors have done exactly as he suggested, and find the opposite sort of problem ! So the claim of MOND enthusiasts that the Tully-Fisher relation actually has some miniscule scatter if you get the measurements right looks extremely suspicious to me : galaxies actually seem to deviate in all directions, even very massive ones. And furthermore, since the peak velocities here do agree with the standard relation, that makes it rather unlikely that there's been some systematic error in the velocity estimate.

The nice thing about rotation curves, though, is that you're not limited by global relations like Tully-Fisher. You can directly compare a galaxy's observed and predicted rotation throughout its whole disc. For this sample, they find most galaxies agree with MOND's predictions, but not all*. A few require the galaxies to have significantly different stellar mass from the measured value, and/or a different acceleration constant - in the worst case by a factor of six. And it would be a pretty stupid theory indeed if you had to change a fundamental constant for each galaxy.

* And they're interesting curves in their own right : some look flat to me, while others are very clearly and consistently declining, and still others show sudden decreases and then remain flat further out.

There is some scope for a MOND rebuttal here. The galaxies are not all truly isolated (one is even in a cluster). The authors say they have no nearby massive companions capable of creating a significant external field effect, but it could be that they've interacted in the recent past and are still out of equilibrium (I don't know if anyone's modelled how this would affect rotation curves, but I'd be a bit surprised if this caused them to rotate more slowly though). There's enough scope of the complexity in the modelling of disc mass and the structure of the curves and so on that the findings could be challenged. Their most deviant galaxy is interesting in its own right : visually it looks disturbed, but also lonely. MOND or not, something interesting is definitely going on here.

Galaxies with Declining Rotation Curves
A sample of 22 spiral galaxies compiled from published data is studied. The galaxy rotation curves pass through a maximum distance of more than $\sim 1$ kpc from the center with a subsequent decrease in the rotation velocity.

Tuesday, 24 March 2020

Don't let your space Nazis die of thirst

It's not all technical critiques of MOND over here. Just occasionally, it's important to step back and think of the bigger picture, like how much water you'd need to supply for an interstellar mission populated by Nazi space milfs.

Haven't got a sodding clue what I'm on about ? Then you must be unaware of my small involvement in a project to simulate the voyage of an interstellar, multi-generational spaceship. This began as an estimate of how many people you'd need to avoid extinction due to inbreeding and whether any breeding controls would be necessary (hence the Nazis), and expanded to consider how much food the population would need. But you can read all about that here.

Having established how many crew you need (about 100 to start, sustainable for millenia at a few times this) and how much food they require, this latest paper looks at air and water. Air consumption depends mainly on mass, whereas water is also strongly temperature dependent, so the code now allows you to set the spaceship temperature.

For this paper, the authors decided to move away from the minimum possible number and use a 1,000-strong crew, about the same as the starship Enterprise (D). Why this number, I'm not sure, but we're now well into the range of guaranteed survivability. The crew begins as a gender-balanced bunch of thirty-somethings (with a few older and younger) and sets them to have different activity levels (which affect food and water consumption) for different age groups. Temperature varies randomly between 18 and 21 C.

The result of this is that the crew need about 300 tonnes of oxygen and 1,000 tonnes of water per year - just for the humans, never mind the plants and animals. Add in nitrogen to the air and the total air mass is not that far off the water.

It's at this point I have to say I respectfully disagree with the authors on the rest of their approach. Once you establish a minimal level needed per some time period, the next step should be to estimate how this is affected by your recycling capabilities. It's already pretty obvious that our spaceship must be in the range of many millions of tonnes to sustain the crew, but obviously, you want to minimise the mass of water and oxygen as much as possible (especially since we may expect the mass to be totally dominated by the requirements of agriculture). So you could then consider the different cycles in play : how much is lost over different timeframes, e.g. some oxygen is combined with carbon in each breath, plants absorb nitrogen, water is lost rapidly through sweating and breathing but much less frequently through urination. Different recycling procedures will be necessary in all of these, so there will be different recovery timescales and efficiencies at work. Accounting for these would get you a handle on the important number : how much of each you need aboard the ship at any given moment.

Somehow I just wasn't able to persuade the authors of this. So instead they look at other techniques of water and oxygen production via chemical reactions. But these are, I have to say, both unnecessary and counterproductive. Water and oxygen can be stored indefinitely with literally zero risk of contamination, because you're in deep space... and, as we all know :

It is very cold in space.
Star Trek II: The Wrath of Khan (1982) - Yarn is the best way to find video clips by quote. Find the exact moment in a TV show, movie, or music video you want to share. Easily move forward or backward to get to the perfect spot.

So the only effect of producing water and oxygen chemically is to bring along a significant amount of extra mass. You're better off by far simply storing the entire supply needed for the journey and assuming no recycling at all, which, we've established, is already a bad idea. Now I do like the approach suggested of integrating the different processes, which sometimes share different resources and produce outputs the other requires, but this is not much developed, and again, it would be better by far to just recycle. Nor do I understand why they object to bringing in water and other supplies from Solar System bodies prior to the mission - the need to do this is absolutely unavoidable, and the mass of bringing in anything other than pure water must be larger than the optimum case of bringing in pure H2O. It doesn't make any sense.

So how much water will the Nazis need ? Dunno. I can tell you they'll need to use a thousand tonnes per year, but how much they'll actually have to bring along could be completely different. No idea at all. Perhaps future papers will look at that.

Water and air consumption aboard interstellar arks
The architecture of a large interstellar spaceship, which is capable of serving as a living environment for a population over many generations, is mainly dictated by the needs of said population in terms of food, water and breathable gases.

Monday, 23 March 2020

A field guide to mapping the Milky Way

How do you go about mapping the galaxy you happen to live inside of ? There's a hell of a lot of information on GalaxyMap.org, but it's not quite what I'm after. So in this post I'll do a step-by-step guide as to how to construct a map using 21 cm neutral atomic hydrogen data. If you actually want to try this for yourself, I'm going to be assuming some familiarity with Python (especially numpy and astropy/pyfits). Otherwise this post should describe the theoretical aspect well enough to be of interest by itself.

Working in Galactic coordinates

HI data has two main advantages : first, it's not at all subject to extinction by intervening stars and dust, and second, it gives us an easy way to measure velocities. Using trigonometry and a few reasonable assumptions, we can convert velocity into distance, with some limitations.

But before that we need to define a coordinate system. The convention for all-sky HI data is to use Galactic coordinates. In this system, the centre of the Galaxy is defined to be at longitude and latitude of 0. Galactic latitude l and longitude b are defined as follows :

Optical image of the Milky Way overlaid with all-sky data from LAB, with the Magellanic Stream highlighted in orange.

There are two main all-sky Galactic HI surveys : the Leiden-Argentine-Bonn survey and the HI4PI survey*. Both have been gridded in a nice friendly way, such that the pixel size is fixed in latitude, longitude, and velocity. Thus once you know the pixel size in each dimension and the world coordinates of any given pixel, you can very easily calculate the exact coordinates of every other pixel.

* It's 4π steradians, but I always read it to mean "HI For Principal Investigator".

For reference, both surveys have the origin at the bottom left (l = +180, b = -90). For the LAB survey, the maximum x-pixel (l) range is 720 and the y-pixel (b) range is 360. The pixel size is 0.5 degrees for both axes. For HI4PI, the x and y ranges are 4320 and 2160 respectively, while the pixel size is 5 arcminutes.

And velocity ? For LAB data this spans the velocity range -458.6 km/s (z = 0) to +458.6 km/s (z = 890), with a channel size of 1.03 km/s. For HI4PI the velocity range is -600.0 km/s (z = 0) to +600 km/s (z = 945*), with a channel size of 1.288 km/s.

* This value may be slightly off. At the time of writing, I can't access the files I need to check.

Note that these values refer explicitly to the gridded pixel values. These are slightly different from the true resolution values, which are more often quoted in the papers. For the researcher, the real resolution is what matters, but for the data visualiser, it's all about the pixels.

Converting to distance

Once we've found the world coordinates of a pixel, and its flux value, we can then convert this to true 3D position, as follows. First we'll need some assumptions. The Sun is reckoned to be rotating around the centre of the Galaxy with speed V_⊙= 220.0 km/s, at a distance R_⊙= 8.5 km/s. The rotation curve of the Milky Way we can approximate to be totally flat, so that the velocity at any point V_pnt is also always 220.0 km/s. Given the velocity (vel) of any point, we can then calculate is distance R from the Galactic centre :

Note that this further assumes that this is independent of galactic latitude.This is reasonable because the disc is quite thin, but causes problems for structures which are outside the disc completely.

Next we can calculate the distance of the point from the observer :

Where d_1,2 refers to the fact that the equation has two solutions. However, it turns out that this is only really true within side the solar circle, so we don't need to do the calculations twice. Rather we should accept that this region of distance ambiguity is inaccessible to us, so we should only do this calculation if R > R_⊙ (we'll see what happens if we disregard this sage advice later on).

If that's so, we can proceed to calculate Cartesian coordinates of our pixel in PPP (position-position-position) space :

These will be in kpc since those are the units we've been working with. We can now iterate over every pixel in our data set and create a full PPP map from our original PPV (position-position-velocity) cube. This is relatively easy to do, and you can find the Python code to do so for LAB data here (note that some simple extra transform is applied to these final equations, just to ensure the data appears in a sensible position in our PPP cube). We just have to specify the size of the cube we want to make and hardly have to worry about anything else at all. Sounds great ! We'll be using the full information from the original data, so we should get a nice, clean, super detailed map at the end, without even having to specify the pixel resolution or anything even slightly complicated, right ?

Wrong. The problem is that PPV maps to PPP in a very strange way, which is not at all intuitive from the equations (unless you're some sort of trigonometric super freak, I guess). There's no guarantee that every pixel in our PPP cube even corresponds to one in our PPV cube. And not all our PPV pixels will contribute anything, since many of them will lie well outside the Galactic disc where our equations are invalid.

Here's what we get from the LAB data if we do this :

Slice through a PPP cube created from LAB data.

In some regions things are relatively good and we can see some nice astrophysical structures, but other parts are hugely undersampled while others have downright weird artifacts. We can do quite a bit better with HI4PI, which has higher resolution and so more fully samples PPP space, but it's still far from perfect.

Why mapping from PPV to PPP is a bad idea

Let's start with the artifacts. Our observations give us velocity along our line of sight, that is, how fast the gas is moving towards or away from us. In reality, the gas is also moving across the sky, but we can't measure that. We can get these "proper motions" for stars with considerable effort, but we just can't get it for gas.

The first problem is not so much that we'd like to know the proper motion (although that'd be nice), but that the equations assume our line of sight velocity measurements are accurate. But because we're inside the disc of the Galaxy, this is not always true. When we look towards the Galactic centre, or in the opposite direction, the only motion of the gas is across the sky - except for a little bit of random motion (~10 km/s). This means that in those regions of low measured velocity, our data is just too inaccurate for our equations to properly convert line of sight to true velocity. Better instrumentation won't help, it's a fundamental limitation of the structure of the Galaxy and our method.

Velocity vectors relative to the centre in green. Blue and red show the components
towards and away from us, respectively.

The second problem is that the equations have that annoying distance ambiguity within the solar radius, where the equation gives two solutions. Although we might be able to break this degeneracy using other data (e.g. by associating the gas with stars of known distances), by itself there's nothing much we can do to save the HI data. So this region, like the low velocity regions, has to be thrown away.

It might help to visualise how velocity maps to distance. One way of doing this is to plot isovelocity lines : lines of constant velocity.

Being inside the disc has weird consequences for what we can detect and where. Since everything's so darn close, sensitivity is extraordinarily high : we can detect essentially all Galactic gas. Looking through our original data cube, we see tonnes of stuff at very low velocities across the entire sky, because gas at high latitudes is only found when it's close to us and, therefore, moving slowly relative to us, due to the thin nature of the disc. But at the same velocities we can also be detecting material on the far side of the Galaxy !

The bottom line for visualisation is that if we start with the PPV map, we don't necessarily fully sample the PPP cube. This explains the other even more serious problem of the image - all those ugly black lines. How can we fix this ? One answer would be to interpolate extra velocity channels and/or spatial pixels in the PPV cube, so that we'd have more points that map to the PPP data. This does help, but it's inefficient and far from perfect. Even using the enormous HI4PI data set, which has vastly better spatial resolution (though similar velocity resolution) gives only a modest improvement in the sampling.

Alternatively, go directly from PPP to PPV

A much better approach is to work backwards. Beginning with a blank PPP cube, we can calculate the corresponding pixel in the PPV cube and use that to fill in the flux values. This essentially knocks all the problems on the head. By iterating every pixel in the PPP cube, we guarantee that we'll sample the whole thing. Although our calculated pixel positions in the PPV cube won't be integer values, all we have to do is simple rounding and we effectively interpolate the missing data (there are more sophisticated ways to do this, but they can wait for another time).

How exactly do we go about this ? We define the coordinate system of the PPP cube arbitrarily. Then, knowing our Galactic coordinate system, we can use some basic trig to calculate the longitude and latitude of any given pixel. We need to get R first, but this is easy because we know the position relative to the galactic centre gc :

I work in degrees, hence the +90 for convention. The pixel positions xyz must be in physical units (kpc). The tan2 function is a wonderful programmatic convention that simplifies things enormously. Using the usual arctan returns values ±90 degrees, since there's a degeneracy in the tan function. Atan2 gets around this by providing two values, returning values ±180 degrees, which is exactly in accordance with Galactic data gridding conventions (we could convery this easily enough to the range 0-360 if we wanted to, but there's absolutely no need).

All we need now is the line of sight velocity. We can get that by rearranging the earlier equation to calculate R :

Since the original PPV data is gridded in a nice friendly way, the hard part's over. Now that we know the longitude, latitude, and velocity of a pixel in the PPP cube, it's trivial to convert this to the pixel in the PPV cube - remember, the original data has pixel size of constant latitude/longitude/velocity.

Voila. We can now extract the corresponding flux value and create a fully sampled PPP cube.

What to do if your data set is feckin' enormous

But wait ! There's one extra complication. If we want to use the LAB data, we can go right ahead and use the final code. Of course, it's always going to be better to use the HI4PI data, but this is difficult to work with because of its gargantuan 35 GB size. The astropy "pyfits" module is not at all good at dealing with large data sets, so we'll need to convert it into a format it can handle. We can do this using the much older miriad software, which was written from an era when a 100 MB file was considered hefty. Consequently it's massively superior in terms of memory management and can process 35 GB files without breaking a sweat, even on a system with 16 GB RAM.

This bit is trivial. First, we convert the FITS file to miriad's own format using the FITS task. Next, we use the same task to extract individual channels, by setting the region parameter to give single-channel slices (unfortunately, this only works when converting from miriad->FITS, not the other way around, which is why we had to convert the file). Annoyingly, miriad insists on producing FITS files of not one but two channels. We can either accept this and have the script only work with the first velocity channel of each cube (it's super simple to slice the data for this), or first process the files and save ourselves from an extra 35 GB of data we don't actually need. Both steps are easy anyway.

Right. We're pretty much there. We've converted our 35 GB single data file into 944 smaller, more manageable files. All we need to do now is modify our PPP code to work with multiple files - and here it is. As a result of all this, we get the following :

Ta-da ! Lovely. Doesn't quite eliminate all of the artifacts, but we can get rid of those in the visualisation stage.

(For the enthusiast : we could in principle go through every pixel in the PPP cube and open the necessary FITS file every time, but this is hugely inefficient - it means opening the files hundreds of millions of times. I estimated it would take the code 4 months to complete, which made me sad. So instead the code precalculates which pixels it needs to extract for each file. It then orders the list, opens each file, extracts all the pixels it needs from that file, and moves on to the next one. This means it only needs a maximum number of 945 file-opening operations and runs in an hour or so. An extra complication is that a list of hundreds of millions of entries starts to cause memory issues, so the code can work in chunks. The user then has to specify the pixel range of the PPP cube they want to search.)

Making things look pretty

For visuals, we have two options. We can either make a volume render using FRELLED, or we can try and pick out the major features using isosurfaces. Volume renders look prettier and use all of the data, but isosurfaces can make it easier to reveal the important structures and are small enough to display as interactive on-line sections of a web page.

Volume rendering via FRELLED need not be explained in any detail here. Let's skip straight to the render :

Sun position in yellow and the galactic centre in green.

What about isosurfaces ? These are something I've struggled with for a while. Eventually I stumbled on a couple of different Python modules that specialise in this. The one I'm using is scikit-image, which can produce meshes in a format that Blender can recognise. It's also fast and deals with large, complicated meshes pretty darn well. It's not 100% foolproof - sometimes meshes have their normal vectors pointing the wrong way, but generally this is easy to fix manually.

(I tried other solutions - extensively - like using metaballs and point cloud skinning scripts, and I'd strongly advise everyone else not to try this. They just don't work very well, at best giving ugly results and at worst being useless and inaccurate. Use a dedicated module and save your sanity !)

The code to generate isosurfaces is a bit of a hack at this stage - incorporating it into the next generation of FRELLED is definitely happening, but that's still a ways off. But for now, it works. You'll need this .blend file (containing an internal script and some pre-set materials) and this external Python script, plus this readme file. Eventually I'll make something less hacky, but not today.

Last but not least - exporting to the web. I wanted to use Sketchfab, which I've been very impressed with, but then I discovered it has a stupid 50 MB file size limit. So I spent a weekend investigating Blend4Web, which is free and totally awesome. It's one of those nice things that just works. So here's an interactive 3D model of the HI content of the Milky Way. It'll work on mobiles (it even has a VR mode option !) but it's better on PC - the labels tend to get cut off on a phone. Click on the buttons in the top left panel to toggle different components, and on the annotations for a bit more information.

It's a 28 MB file, may take a few minutes to load, and Blogger won't let me
embed it correctly, so click here for the interactive version.

It's far from perfect yet - the bloom effect is a bit strong and the anti-aliasing is lousy. But this is my first attempt, so I'm pretty pleased with it. Expect updates on this and lots more interactive content.

So that's it : you now know absolutely everything about how to map the hydrogen content of the Milky Way disc. In a future post I'll look at how to do something similar for the surrounding clouds, which are a lot more fun in 3D because they're found across the entire sky.