Little Physicists: The Joy Of Stacks

Sensitivity is somewhat of a recurring topic, and I don't mean the kind where people whinge about offensive statements about Nazi-themed beards on twitter. I mean the sort where you look at your data and try and work out what the hell it is you've managed to detect, if anything.

I've already covered that sensitivity can't really be given as a single number when you're dealing with real data. You have to account for how complete your catalogue is (how many sources present you've detected) and how reliable it is (what fraction of sources you identify correctly). I've also examined in some length how quantifying this can be arguably impossible, which is philosophically interesting even if practically there's bugger all you can do about it.

But I've glossed over sensitivity itself : the capacity to detect things in the first place, even if you don't actually detect anything at all. This is generally easier to grapple with than the statistical problems of completeness and reliability, but there are still subtleties that need to be dealt with. One particular aspect had me tearing my hair out during my PhD, and recent developments have made me realise that this wasn't just me who was thoroughly perplexed. It's one of those features which will seem, I hope, quite obvious when it's stated, but can be completely non-intuitive if no-one tells you what's going on... to some people at least.

Stacking Made Simple

Suppose you target a well-studied region of space with a nice shiny new HI survey. You've already got a lovely optical catalogue, and notwithstanding the problem of completeness, there are plenty of galaxies there you already know the positions and redshifts and all that other gubbins for. You spend ages and ages doing your radio survey, and behold ! Nothing. Nada. Not a single bloody detection. What's an astronomer to do ?

Well, the most obvious answer is to ~~panic and run around the room tearing other people's hair out~~ extract the radio data at the positions of all the known galaxies, add them together and divide the result by the number of objects. This process of averaging the data is called stacking. The idea is that by combining all the different signals, you're essentially doing the equivalent to taking a longer exposure. You're increasing your sensitivity and should be able to detect fainter emission than you could otherwise. Depending on the qualities of your data and the number of galaxies you detect, you might well be able to increase your sensitivity by, oh, say, a factor ten, which is pretty awesome since you don't need to do any new observations. Woohoo, free sensitivity !

Formally, the noise level of your stacked data scales by a factor n^-0.5, where n is the number of galaxies you stack (at least that's the theoretical best expectation; it will usually be a bit worse). In the ideal case, if you have a signal present that's too weak to detect in any individual sections, then averaging those weak values will give you the average value of the signal. In contrast, if you average noise values together, they should be purely random and the more noise you combine the closer you get to zero. So your signal to noise increases, as long as your noise really is purely random or nearly so.

It's well-known that the penalty you pay for stacking is that, in the event you manage to detect something, you don't know which objects actually contain the gas you've detected. Some individual objects might have a bit more gas than your final detection, while others might have less. The crucial thing is that the stacked data tells you about the average amount of stuff present, and absolutely nothing about individual galaxies. After all, the process of adding up and dividing by the total number is the very definition of averaging.

Stacking Made Unnecessarily Complicated

What's all too easy to forget is that this means you do not improve your sensitivity to the total amount of gas present. In fact, your sensitivity to the total gets worse : it's only your capability to detect the average mass per object which improves. You have, after all, averaged your signal, so your new measurement must be an average measurement by definition. I'm going to keep repeating that.

Perhaps the simplest way to understand this is to imagine you had two galaxies with clear detections. If you wanted their total mass, you'd simply add their signals together. If you wanted their average mass, you'd add them and divide by two. Exactly the same principle applies when stacking non-detections : the sum gives you the total mass, the stack gives you the average. And the total mass cannot possibly be less than either of the individual components.

When you put it like this it might seem so buggeringly obvious that you'd laugh in the face of anyone who doesn't understand it, which is very mean* considering the massive confusion I got myself in when trying to work it out from first principles (see chapter 7). The thing is, when you do a stack and you calculate how much stuff you're sensitive too, it's awfully tempting to assume this means your sensitivity to the total, because that's how things normally work. Thinking in terms of averages isn't natural.

* Pun NOT intended.

Or we can look at this mathematically. In any survey, the mass sensitivity is governed by the noise of the data :
M ≡ σ_rms
Where σ_rms is the noise level. Note the equivalent sign rather than an equals sign : the precise equation will depend on the exact nature of the data. It doesn't even have to be mass - it can be whatever quantity we're studying - but I'll use this as it's probably the easiest thing to understand.

Let's alter this slightly and let σ₁be the rms noise level for an single observation. We know that if we stack n galaxies, the new noise level becomes σ₁n^-0.5, so our new average mass becomes :
M_av ≡ σ₁n^-0.5
Which is clearly an improvement over what we started with - we can detect smaller masses than we could before, which is good. But remember, this is our average mass. The total mass is simply n*M_av, so :
M_tot ≡ σ₁n^+0.5
Keeping in the plus sign so it's easier to see the difference compared to the average. But look - this is worse than what we had originally ! The more galaxies we add, the better our average mass sensitivity, but the worse things get for total mass.

Averaged noise level as a function of the number of objects stacked.

Equivalent noise level for the total number of objects stacked.

A final way to think about it is to consider information. To get better sensitivity to the total mass, we'd need more information per individual object, and obviously we can't do that just by adding in data from other objects.

It's high time we had a visual example. In the data below, I injected a faint source into 100 images. The left panel shows the raw individual images with no further processing. The middle panel shows the cumulative sum of the images, while the right panel shows the averaged result. Exactly the same colour scale is used for all images.

You can see the source is never detected in any of the raw images, but it's quickly visible in the summed and averaged images. But you can also see that the values are much more extreme in the simple summed panel compared to the averaged image. The averaged value is simply the sum divided by the total number of images, so, essentially, the averaged image is simply a scaled version of the summed image. This means that despite appearances, it's really the summation that improves detectability, not the averaging. We can see this if we use an adaptive colour scale that sets the colour range based on the minimum and maximum value in each image :

Then we see that the summed and averaged images are visually identical. The actual values shown are of course different, but how we display them matters a great deal. The noise averages to zero, but it doesn't necessarily sum to zero; the source sums to ever-greater values for each image it's present in, but averages to its true value.

Stacking Made Subtle

To re-iterate : stacking is the same as averaging, so while it improves your sensitivity to the average, it makes things worse for the total. Summing improves your signal strength and thus detectability, but only for data points in which a source is present.

Of course, once we get beyond that, things get more complicated. Normally you wouldn't necessarily do such a simple averaging, because the noise isn't usually the same everywhere. So you'd weight each source by its local rms value and then divide by the sum of the weights rather than the numerical total. This decreases the contribution of the noisiest regions and maximises the benefit from the ones with the lowest noise.

Another issue is whether you go for sheer noise level or actual physical mass. For the former, you want as many objects as possible. But for the latter, you want to account for the distance of your targets - the further away, the less sensitive to mass you'll be, so you might not want to combine them with closer objects. Remember, the equations had an equivalent sign, not an equals sign. Mass sensitivity does not directly equate with simple rms value, though it may be directly proportional to it.

Finally, it's important to realise that this is a general case and a fundamental property of stacking. If you smooth your data spatially you're doing much the same thing : your sensitivity to the average mass per pixel improves, but at the cost of not knowing quite so well where that mass is precisely located, and your total mass sensitivity gets worse. This is why spatially smoothing isn't guaranteed to detect anything if you go nuts with it and smooth it to hell : sure, your surface brightness sensitivity can be phenomenally good, but yer total mass sensitivity is crappy. You're hoping not only that there's some big diffuse material lurking around, but that material is also very substantial : sure it can be thin, but it also has to be present over a larger area.

The same goes for stacking in velocity space in radio data, which is what prompted this post. Integrated flux maps actually have worse mass sensitivity than individual channel maps, despite containing information from more channels. Of course if you were to average them by the number of channels you'd see an improvement : again, you increase your sensitivity to the average column density of the gas but at the expense of your sensitivity to total mass. In essence, your average mass sensitivity represents the mass you could detect if it was present in every stacked location.

So that's it. Stacking is a useful addition in the arsenal of Tools For Detecting Stuff, but like everything else, it's got limits.

Little Physicists

Tuesday, 14 January 2020

The Joy Of Stacks

No comments:

Post a Comment

The Bunny Rabbit of Death