Fisher projections for CMB-S4, v3

 (Victor Buza)

This posting describes the development of a Fisher forecasting framework (developed by Victor Buza, Colin Bischoff, and John Kovac), specifically targeted towards optimizing tensor-to-scalar parameter constraints in the presence of Galactic foregrounds and gravitational lensing of the CMB. I first describe the methodology and then present an example forecast for CMB-S4.


1. Schematic of Fisher Machinery

Figure 1 is a schematic representation of the framework, indentifying the user inputs, code modules, and outputs of said modules. This code overlaps significantly with the code used for the BICEP/Keck likelihood analysis and our belief in the projections is grounded in that connection to achieved performance / published results. In particular, we emphasize the importance of using map-level signal and noise sims (of the BICEP2/Keck dataset) as a starting point. We know that these sims are a good description of our maps because we pass jackknife tests derived from them.

Our confidence in these projections is further enhanced by their ability to recover achieved parameter constrains quoted in the BKP and BK14 papers. BK14 quotes an achieved \(\sigma_r=0.024\) by performing an 8-dimensional ML search (with priors) on a set of 499 Dust + \(\Lambda\)CDM sims, and deriving the standard deviation of the recovered \(r_{ML}\). Though BKP did not do a similar exercise, were it to do that, it would have quoted \(\sigma_r=0.032\). Our Fisher forecasts for these particular scenarios recover \(\sigma_r=0.033\) and \(\sigma_r = 0.024\), which are within sample variance (from the finite number of sims) from the real results. For the particular data draws of BKP and BK14, we can even compare the marginalized posteriors to the Fisher Contours, though this is a weaker comparsion. Firstly, we know the Fisher Contours will be Gaussian, while the real contours will likely not (BKP and BK14 use the H-L likelihood approximation), and secondly, we know that each particular data realization will yield differently shaped contours, so an ideal match of Fisher Contours to any particular realization should not be expected. Nonetheless, the contours are still quite faithfully recovered: BKP vs Fisher, BK14 vs Fisher.

Inputs

Multicomponent Model

BPCM (Bandpower Covariance Matrix)

Fisher Forecast

Figure 1:
This is a schematic representation of our Fisher Machinery. Grey ovals represent User Inputs, White boxes represent Code Modules, and Yellow ovals represent Code Outputs.


2. Worked-out Example; Experiment Specification

Below, I present an application of this framework to an example motivated by previous BICEP/Keck experience.


3. Parameter Constraints; \(\sigma_r\) performance

In this section, I focus on the 1% BICEP/Keck patch, and search for the optimal path for a number of fixed levels of delensing. In addition, I also fold in delensing as an extra band in the optimization, thus allowing the algorithm to decide at each step if the effort is better spent towards foreground separation, or towards reducing the lensing residual. I optimize for two possible levels of tensors: \(r=0\) and \(r=0.01\), and plot the resulting paths, individual map depths in \(\mu K\)-arcmin, the effective lensing residuals (after delensing), the amount of effort spent delensing, \(\sigma_r\)'s, and logarithmic derivatives of \(\sigma_r\). In addition to the various delensing cases, I also consider a Raw Sensitivity case (conditional on zero foregrounds).

Things to note:

Figure 2:

(Top, Left) Optimal path indicating the total number of det-yrs, and the individual distribution of det-yrs at each point.
(Top, Right) Individual map depths for every channel, in \(\mu K\)-arcmin. Calculated from the accumulated weights in each channel on the BICEP2 patch.
(Middle, Left) Ratio of the total effort that is spent on delensing, as a function of total effort.
(Middle, Right) Effective \(A_L\) as a function of total effort.
(Bottom, Left) Resulting \(\sigma(r)\) constraints for each level of delensing, as well as from a limiting (conditional on no foregrounds) raw-sensitivity case.
(Bottom, Right) The logarithmic derivative of \(\sigma(r)\) with respect to det-yrs, indicating the different regimes of gain.


4. \(\sigma_r\) vs \(f_{sky}\)

Our Fisher formalism also includes an \(f_{sky}\) knob. The effects of \(f_{sky}\) are implemented in two steps:

These are two competing effects; while the \(N_l\)'s are getting larger with \(f_{sky}\), thus hurting our constraints, observing more modes goes in the opposite direction. The interplay between these two effects can be seen in the pager below, where I choose five levels of effort {\(5\times 10^4, 5\times10^5, 10^6, 2.5\times 10^6, 5\times 10^6\) total det-yrs}, and then turn the \(f_{sky}\) knob over the rage of \([10^{-3}, 1]\), getting the optimized number of det-yrs in each of the S4 bands, and the \ resulting \(\sigma_r\) constraint for each \(f_{sky}\). Note: for the adaptive delensing case, I find the optimal path for each \(f_{sky}\); for the fixed delensing cases, as before, the optimization is only done once, for the 1% patch, and the same optimized path is used for all \(f_{sky}\) values. In addition to this, as mentioned in the caption below, the adaptive delensing case takes into account the amount of effort spent towards delensing, while the fixed delensing cases don't.

Things to note:

There are a number of effects that penalize large \(f_{sky}\) that are not included in this analysis:

Figure 3:


5. Dust decorrelation

>> Decorrelation parameterization

Below, I expand on the optimization done above by introducing a decorrelation parameter that suppresses the dust model expectation values in the cross-spectra. This parametrization is not necessarily physically motivated, and is meant to be a simple example for how decorrelation can be folded in our framework; more complicated parametrizations could take its place if one were so inclined.

The correlation coefficient for a particular cross spectrum is given by the ratio of the cross-spectrum expectation values to the geometric mean of the auto-spectra (if there is a non-zero decorrelation effect, the cross will register less power than the geometric mean of the autos, R<1). Therefore, the decorrelation coefficient is given by (1-correlation): \[\tilde{R}(217,353) = 1-\frac{< 217\times 353>}{\sqrt{<217\times 217><353 \times 353>}}\] This is a quantity that is less than or equal to one, dependend on the two frequencies involved in the cross, and perhaps different for different scales. We can try to model this more generally by assuming that it has a well-behaved \(l\) and \(\nu\) dependence, reflecting the different degrees of correlation one would get at various scales for different cross spectra: \[R(\nu_1,\nu_2,l) = 1 - g(\nu_1,\nu_2) f(l)\] where \(g(\nu_1,\nu_2)\) and \(f(l)\) can take different function forms. The four fairly natural scale dependencies I have explored are: \(f(l)=al^0\), \(f(l)=al^1\), \(f(l)=alog(l)\), and \(f(l)=al^2\), where \(a\) is a normalization coefficient. I am not motivating these physically, but I will argue that they span a generous range of possible scale dependencies.

Now, for the frequency scaling, John and I came up with a more well-motivated behaviour, and it stems from one of the ways through which decorrelation could be generated. The premise is simple: given a dust map at a pivot frequency, if there is a physical phenomenon that introduces a variation(\(\Delta \beta\)) of the dust frequency spectral index map \(\beta\), when extrapolating to other frequencies, the variation we've just introduced will change the dust SED and create decorrelation effects at various frequencies. We built this toy model so that we could study the resulting frequency dependence, and we've come up with the empirical form (where \(g'\) is a normalization factor): \[g(\nu_1,\nu_2)=g'*log\left(\frac{\nu_1}{\nu_2}\right)^2\]

>> Decorrelation re-mapping and normalization choice

It is easy to see that the product of \(g(\nu_1,\nu_2) f(l)\) can diverge quite rapidly, making the correlation definition \(R(\nu_1,\nu_2,f(l))\) physically non-sensical. It is clear that in the limit of large \(g(\nu_1,\nu_2) f(l)\), \(R\) should assimptote to zero. Therefore, I introduce a re-mapping of these large correlation values to something that follows \(1-g(\nu_1,\nu_2) f(l)\) loosely for the ell range we care about \([<200]\), and then assimptotes to zero rather than diverging to \(-∞\). In particular, my remapping is defined as: \[R(\nu_1,\nu_2,l)=\frac{1}{1+g(\nu_1,\nu_2) f(l)}\]

For the analysis below I define my normalization factor \(a*g'\) such that \(g(217,353)=1\) and \(f(l=80)=a\), making \(R(217,353,80)=1-a\). The reason for this normalization definition is mostly the historic over-representation of \(R(217,353,l)\) in conversations about decorrelation. In my studies, the form of \(f(l)\) (out of the ones listed above) that yields the largest decorrelation effects is unsurprisingly \(f(l)=a l^2\); therefore, as an example of an extreme case, I pick this \(f(l)\) shape, and choose my normalization coefficient to be \(a=0.03\), which PIPXXX, Appendix E says is \(1\sigma\) away from the mean decorrelation over the studied LR regions. For this particular \(a\) and \(f(l)\) choice, my correlation remapping looks (for a few of the more important cross spectra for a BK14-like analysis), like this. Notice that these are generally large values of decorrelation.

>> Effects of decorrelation in our Fisher optimization

For the Fisher Evaluation, I fix the fiducial value of the decorrelation amplitude to be \(R(l=80,\nu_1=217, \nu_2=353)=0.97\), as mentioned above, but leave it as a free parameter with an unbounded flat prior. This is the most pessimistic choice from the options explored, and should therefore place an upper bound on the effects of decorrelation on the recovered \(\sigma_r\) levels.

First, I would like to show an example by demonstrating the decorrelation effects for a BK14 evaluation: BK14 Fisher Ellipses with (black) and without (red) Decorrelation. You can notice the degeneracies of the decorrelation parameter with the dust parameters, as well as \(r\). In addition, this also makes it easy to observe the effects of the decorrelation on the constraints of these parameters; in particular, we see that it introduces more degeneracy in the \(r\) vs \(A_d\) plane, and weakens the \(r\) and \(A_d\) constraints. It's worth noting that even though the decorrelation parameter was left unbounded in the Fisher calculation, it's resulting ellipse is reasonably constrained.

Next, I perform the same optimization as done in Figure 2, except now with the ability to turn this decorrelation parameter on. I only perform the optimization with an adaptive delensing treatment (and do not do it for the fixed delensing levels), meaning that the delensing is introduced as an extra band in the problem.

Figure 4:

(Top, Left) Optimal path indicating the total number of rx-yrs, and the individual distribution of rx-yrs at each point.
(Top, Right) Individual map depths for every channel, in \(\mu K\)-arcmin. Calculated from the accumulated weights in each channel on the BICEP2 patch.
(Middle, Left) Ratio of the total effort that is spent on delensing, as a function of total effort.
(Middle, Right) Effective \(A_L\) as a function of total effort.
(Bottom, Left) Resulting \(\sigma(r)\) constraints for each level of delensing, as well as from a limiting (conditional on no foregrounds) raw-sensitivity case.
(Bottom, Right) The logarithmic derivative of \(\sigma(r)\) with respect to rx-yrs, indicating the different regimes of gain.

Next, I re-do the \(f_{sky}\) optimization, similarly to Figure 3, with the ability to turn decorrelation ON and OFF. As expected, the \(\sigma_r\) constraints degrade, and a portion of the effort that used to be spend on delensing got reallocated to foreground separation. Additionally, one can see the slope (with \(f_{sky}\)) becoming slightly shallower, though in principle, the optimal solution remains vastly the same.

Figure 5: