Sister blog of Physicists of the Caribbean. Shorter, more focused posts specialising in astronomy and data visualisation.

Monday 13 November 2017

Lecture 3/4 : Be Careful What You Wish For

This is the third part of my super-shortened course on galaxy evolution. You can find the complete transcript of the 90 minute lecture here, or you can stay with this post for the 9 minute version (if not less).

In galaxy studies we have very limited data and can't control our test subjects. Instead, we have to rely on restricted statistical data and numerical models, so it's crucial to understand the limitations of those models. Essentially this post is about why we interpret the data in the way we do, and why getting the right answer just isn't good enough. And there'll be some galaxy evolutionary theory thrown in as well, just for good measure.


The missing satellite problem

Simulations have gone from simple gravity models of the 1940's to the all-singing all-dancing models of today, from using a few tens of particles to a few billion (or more). Nowadays they can include full hydrodynamics, heating and cooling of the gas affected by radiation, heat conduction and chemistry, magnetic fields, and basically be as sophisticated as hell. Their major limit is that a lot of parameters can't be set from observational measurements - we have to guess them. More on that later.

Example from the Illustris simulation.
It's good scientific practise to KISS : Keep It Simple, Stupid. Don't dive in to the really sophisticated models - begin with something much simpler, gradually increasing the complexity so you understand what each new factor is doing. For instance, the Millennium Simulation was a vast, 10 billion particle model of the evolution of the dark matter in the Universe. It contained absolutely nothing except collisionless but gravitationally-interacting dark matter particles. On the large scale this works really well :

The raw particle data is on the left. Based on the observed relation between the dark matter mass of a galaxy and its visible light, the right hand panel is a prediction of what we would detect at optical wavelengths.
Recall from the first lecture that this structure of filaments and voids is just what we see in reality. Great ! Except, no. On the smaller scales of individual galaxies, the model has problems.

As above, raw particle data on the left with predicted visible light (in red and blue) on the right.
Almost every dark matter halo in the simulation contains potentially detectable normal (baryonic) matter. The bottom line is that these kinds of semi-analytic models predict about ten times as many dwarf galaxies around the Milky Way as we actually observe. Since the dark matter is much more massive than the baryons, adding them in shouldn't be able to change the result very much - or at least that's the naive interpretation. First, we need to understand a bit more about the simulations themselves.


Simulations are not magical

I used to think that because you know all the physics at work in a simulation, you automatically understand whatever it does. Yet while you do get full 3D information with complete time evolution, you rarely get a full understanding of what's happening. For start, simulations have restrictions just as observations do. Their resolution is limited (e.g. by the number of particles and computational power), they don't include all the physical processes at work (because some are hard to simulate, while others are just not fully understood), and what we decide to simulate in the first place is heavily influenced by observations - which have their own problems. So they are, necessarily, simplified. It's important to try and convert our numerical predictions into something we can directly and fairly compare with observations.

Simulation (left) and observation (right) of the Auriga's Wheel galaxies.
The above example simply coloured the simulation particle data so that it looked like the original observations, but much more sophisticated approaches are possible. Creating synthetic observations adds even more complexity : you need, for instance, to model how the gas causes absorption and scattering of the light emitted from the stars, to replace your simulated generic gas particles with multiple gas phases, and a host of other factors besides.

While stars are generally simulated as n-bodies (point mass particles which have gravity but nothing else), the gas is more complex. There are two main ways of dealing with the hydrodynamic effects :


1) Smoothed particle hydrodynamics

In SPH codes the gas is modelled as a collection of particles. As well as having mass, each particle is deemed to be part of a kernel with its surrounding neighbours, over which the hydrodynamic equations can be solved. This then accounts for the variation in density, temperature, and pressure. In effect the particle data is transformed into something more like a continuous fluid.


With the kernels set to contain a fixed number of particles, the resolution of the simulation is adaptive : there are more computations where there are many interactions and less where there are fewer. And you can trace the history of each particle and find out where it originated. SPH suffers where there are sharp boundaries between different fluids though - it has difficulty reproducing observed structures.


2) Grid codes

Another approach is to do away with particles completely. Instead, a finite volume of space can be modelled as a grid of cells, each of which containing some fluid with density, temperature, pressure and velocity. Thus it models how gas can flow from cell to cell.


Cell sizes can vary so the resolution can be adaptive. Grid codes are much better at modelling hydrodynamic structures, but tend to be computationally expensive and there's no way of knowing where gas in any particular cell originated. So despite knowing all the initial conditions, there are fundamental restrictions on what you can learn from simulations.


Handle razors carefully !

Simulations, and especially the comparisons to observations, are complex beasts. Clearly there's some virtue in keeping things as simple as possible, but even here we have to be careful. There's a popular notion - spread by Jodie Foster in the film Contact - that Occam's Razor says that the simplest explanation tends to be right one. Occam, however, said no such thing. He said something more like, "entities must not be multiplied beyond necessity" - in essence, prefer simple explanations.

There are good reasons for this, but they have nothing to do with any kind of fundamental truth. Indeed, in science we should never presume to know how the world works : start thinking that the simplest explanation is usually correct and you rapidly degenerate into "a wizard did it". The Universe is a bloody complicated place, and sometimes it needs complex explanations.

John Von Neuman is reported to have said that with three free parameters he could fit an elephant, and with four he could make him wiggle his trunk. The more complex your explanation, the more you can adjust it to make it fit the observed data. Simpler explanations are much harder to fudge and therefore easier to test. But that absolutely does not mean that you never add complexity, because it's equally possible to over-simplify and miss some vital physical process.

In the case of the missing satellite problem, the complexity of the baryonic physics that we're missing from the pure dark matter simulations may have nothing to do with changing the halo structures at all. Instead it might be an example of a much more subtle selection effect.


Selection effects : correlation doesn't equal causation

We know the mass of the baryons is too small to affect the dark matter in our simulations. But we also know that the baryons are the only thing we can observe directly. So perhaps our simulations are missing some mechanism that restricts the presence of the baryons on only certain dark halos : maybe the rest do exist, but remain invisible. It's worth a brief digression here to show how important selection effects can be, and why statistical measurements can be woefully misleading.


The above chart comes from the fantastic website Spurious Correlations. This correlation is statistically significant, but physically meaningless. For a start, it's not at all clear which is the independent (controlling) variable : does excessive cheese consumption drive people insane and make them become entangled in their bedsheets, or do people commiserate bedsheet-based deaths by eating more cheese for some reason ? Both interpretations are equally absurd and the data says precisely nothing about which way round it goes.

From this paper.
A second example : the pitch angle of spiral arms (a measure of how tightly wound they are) in galaxies correlates with the mass of their supermassive black hole. This is completely unexpected because the central black holes, though massive, are minuscule in comparison to most spiral galaxies. Local gravity sources (e.g. ordinary stars) ought to dominate at large distances - there's no plausible direct connection between the black hole and something as large as a spiral arm. But there might, the authors suggest, be a connection between a third factor such as the density profile of the dark matter.


Charts like those in Spurious Correlations are a variety of what's known as p-hacking : plot everything against everything and see what sticks. Surprisingly tight correlations can occur by chance if you plot enough variables together : what you're not being shown are the many variables which have no correlation whatsoever. Simply put, if something has a million to one chance of happening, if you give it a million opportunities to happen then it probably will. Other unexpected relations can occur because of common underlying factors with no direct connection between the two plotted variables.

Last time I mentioned different procedures for measuring the size of a galaxy, and we saw that despite being objective they gave very different results. As with automatic galaxy-finding algorithms that produce catalogues of low reliability, the point is that an objective procedure is not the same as being objectively correct. We'll see an example of this shortly and much more in lecture 4.


Unknown unknowns

So the observations of baryons may be severely limiting our view of the Universe. The naive expectation that adding in the baryons can't change the distribution of satellite galaxies predicted in the simulations may be over-simplifying : we might be witnessing a selection effect. Though it must be said that it isn't at all obvious as to the precise mechanism, it is at least conceivable that baryonic physics could limit which halos actually host visible galaxies.

Recently there's been some very interesting discoveries suggesting that that is indeed the case. While galaxies of especially low surface brightness have been known for ages, no-one thought they were numerically significant. That changed in 2015 with the discovery of 800 so-called ultra diffuse galaxies (UDGs) in the Coma cluster, galaxies which are about as large as the Milky Way but as much as a thousand times fainter.


UDGs have since been discovered in all kinds of environments, even in isolation. Most appear to be smooth and red but some are blue and structured, resembling standard LTGs but much fainter. Some are even known to have gas. At the moment, because UDGs are hard to identify, we can't say in which environment they're most common. More problematically, we can't quantify their typical dark matter content. If they're low mass, then UDGs at least alleviate (but do not solve) the missing satellite problem. But if they're massive, then they make things worse. It's of course possible that some are massive and some are not, but the important value is their typical mass, and that we just don't know at all.


If it disagrees with experiment, it's wrong annoying

You could be forgiven for thinking that there are enough problems with the standard model that we should just chuck it out and start again. There are indeed problems, but if we let every difficulty count as a falsification then every theory will have to be discarded. The point is that all our models have been over-simplifications, and without the full physics included we actually can't say if they're wrong or not : maybe they have fundamental problems, maybe they don't.

Occam's Razor is useful, but there is such a thing as over-simplifying.
As mentioned, rare events happen by chance if given enough opportunities. We see this particularly in HI spectra, where the non-Gaussian nature of the noise means we sometimes see very convincing signals indeed that turn out to be spurious. My favourite example of all, though, is a simulation of these spectacular interacting galaxies :

From this paper. The northern elliptical galaxy is included in the simulation but is just out of the field of view.
A pretty good match - not perfect, but good. The problem is that this model of the galaxy's formation only included the disturbed galaxy and the elliptical, but subsequent observations found this :


A much larger third galaxy is clearly involved, but that wasn't included in the model. So the model has got the right answer - even in terms of quite fine structural details - by the wrong method ! Getting the right answer is a necessary but not sufficient condition for a good theory. The success of one model does not preclude the success of others.


Don't be hasty

We could turn to our models and say, "hmm, these all have problems, let's chuck them all out and start again", but this would be the wrong lesson to learn. A better lesson would be that if they have problems they need to be modified and improved : we must always be cautious. Only when we find a really deep flaw in the most fundamental nature of a model should we completely reject it.

While the standard dark matter model does have problems, it's also important to remember that it has tremendous successes as well. As well as reproducing the large-scale structure of the Universe, it also works extremely well at explaining colliding galaxy clusters. The Bullet Cluster and other cases show what happens after two clusters collide. Remember a cluster is a gravitationally bound structure containing its own dark matter, gas, and galaxies. If two clusters pass through each other, we'd expect the galaxies (bound by the collisioness dark matter of their parent clusters) to keep going but the gas to get stuck in the middle because of its large volume. That's exactly what happens, and, importantly, gravitational lensing confirms that the dark matter does exactly what it's supposed to. It's very hard to make this work without dark matter.

Overlaid on the optical images are X-ray gas in pink and dark matter (from lensing) in blue.
It's possible that we will eventually find a flaw with the standard model so fundamental that it can't be saved. But it's also possible that there are other, more nuanced aspects of physics we don't understand that could explain the problems without such a drastic rejection. For example, we know the main processes driving galaxy evolution today were very different in the past. Star formation activity peaked around 10 Gyr ago, as did AGN activity, and merger rates have also been decreasing. In the early Universe there were no galaxy clusters, so the processes of tidal encounters and ram pressure stripping would have been very different. There were also population III stars, far more energetic than any we see today, ionising large parts of the Universe ("squelching").

There are two main theories as to how galaxies assembled themselves. One is the idea of monolithic collapse, that huge rotating gas clouds simply collapsed and went thwoooop to form a galaxy. Simulations show that this is very successful. The problem is that there's no evidence or reason to suppose that such monoliths ever existed. Physics instead points to the now-dominant paradigm of hierarchical merging, where galaxies assemble themselves through the cannibalistic merging of smaller galaxies. This has plenty of problems besides the missing satellite issue, but is a far more natural expectation based on our understanding of physics.


Not even wrong ?

I'll finish with some final statistical lessons that we'll need for the concluding post. As mentioned, objective procedures are not necessarily objectively correct. A really wonderful example of this is provided by the datasaurus project. In the gif below, at every frame of the animation the positions of the points have exactly the same mean and standard deviation of their positions !


What this means is that quantification can sometimes of of limited help. You can't quantify "dinosauriness" by a simple parameter like mean position : you have to look at the data yourself. You can, should, and indeed must make statistical quantifications for your analysis. But you also have to look at the damn data, because while using quantitative parameters is fine, relying on them exclusively is an absolutely dreadful idea.

This isn't an abstract mathematical idea either. The US air force suffered from the flaw of averages when they designed their fighter aircraft based on the average dimensions of their pilots, not realising that very few pilots indeed were close to the averages in all parameters : everyone really is unique just like everyone else. This meant a loss of planes and pilots because in a fighter plane it's really, really important you can reach the controls when you need to. The solution ? Instead of tailoring each plane to each pilot, they developed adjustable seats so pilots could set things for themselves no matter which plane they flew.


That's my philosophy of science rant over. The main take-home lessons are :

  • Prefer simple explanations (but don't go thinking that simpler means its more likely to be true, it's just easier to test)
  • Objective procedures are not necessarily objectively correct - and indeed there are some things you just can't quantify at all
  • Different models can be equally successful  - always try and test multiple explanations, because the fact that one model works well doesn't meant that others are disproven
  • The interpretation of what the data means is down to you and you alone. No algorithm can tell you what the data really means. Do not avoid statistical testing, but don't avoid subjective judgements either.

5 comments:

  1. "The difference between objectivity and correctness". A point there!
    Objectivity is a necessary requisite for correctness. Is it a sufficient condition too?
    "Correlation not equaling to causation"
    Well put!

    ReplyDelete
  2. I agree. Probably the most important point about statistical methods is that the are lossy. And the number of variables/information lost might have significance outside of the problem domain under study.

    ReplyDelete
  3. Sushama Karnik See the updated summary above, and the post itself. :)

    I'd say objectivity is certainly not sufficient for correctness. You could create a procedure which measures things objectively but it just plain wrong. Is it necessary ? I lean towards no on that one as well : some things just cannot be measured. And if you can't measure it, you can't really do it objectively.

    ReplyDelete
  4. Jack Martinelli Indeed. The next post will have a detailed case study where I think certain very intelligent people have taken some complicated statistical techniques and used them inappropriately. They have (in my opinion) understood the maths but not at all the underlying methodology of considering the various selection biases.

    I think the best approach is a combination of the subjective, visual inspections working in concert with statistical analysis. If I can see a trend in the data, then I tend to believe statistical measurements. But if those measurements are being used as evidence for something I cannot see, then I become very skeptical.

    ReplyDelete
  5. I appreciate your approach. I am looking forward to your posts. Rhys Taylor

    ReplyDelete

Back from the grave ?

I'd thought that the controversy over NGC 1052-DF2 and DF4 was at least partly settled by now, but this paper would have you believe ot...