Maximum likelihood search results for Data Challenge 02

2017-09-27 (updated 2017-10-18) (C. Bischoff, V. Buza, J. Willmert, B. Racine)

Summary

This is a long posting, so here are the important points:

A significant part of the bias that we see in this maximum-likelihood search analysis scales with \(A_L\) and seems to just be a mismatch between the lensing \(BB\) spectrum used for the sims and the one used in the analysis. But this also reinforces the idea that we should marginalize over some (small) uncertainty in the lensing spectrum.
Some of the foreground models, especially Hensley/Draine dust and Ghosh dust, yield low values of \(\beta_d\) which is in tension with the Planck-derived prior used in the analysis. It would be a good idea to rerun this analysis without that prior, since CMB-S4 is quite capable of measuring \(\beta_d\).
We should try some Fisher forecasts that assume some level of foreground decorrelation across frequencies. While the existing forecasts do include the decorrelation parameter as part of the model, they don't account for the effect of foreground decorrelation in the bandpower covariance matrix.

Introduction

This posting summarizes results from analysis of CMB-S4 Data Challenge 02 using a BICEP/Keck-style parametrized foreground model. For each realization, we find the set of model parameters that maximizes the likelihood multiplied by priors on the dust and sync spectral index parameters (\(\beta_d\) and \(\beta_s\)). These priors are based on Planck data, so they are quite weak in comparison with CMB-S4 sensitivity.

The model includes the following parameters:

\(r\): tensor-to-scalar ratio
\(A_d\): \(BB\) power spectrum amplitude of dust, in \(\mu K_{CMB}^2\) units at \(\nu\)=353 GHz and \(\ell\)=80
\(\beta_d\): dust emissivity spectral index; Gaussian prior on \(\beta_d\) is centered at 1.6 and has width 0.11
\(A_s\): \(BB\) power spectrum amplitude of synchrotron, in \(\mu K_{CMB}^2\) units at \(\nu\)=23 GHz and \(\ell\)=80
\(\beta_s\): synchrotron emissivity spectral index; Gaussian prior on \(\beta_s\) is centered at -3.1 and has width 0.3
\(\alpha_d\): power law index for dust \(\mathcal{D}_\ell\) scaling in \(\ell\); limited to range [-2,2]
\(\alpha_s\): power law index for synchrotron \(\mathcal{D}_\ell\) scaling in \(\ell\); limited to range [-2,2]
\(\epsilon\): frequency-independent spatial correlation of dust and synchrotron; limited to range [-1,1]
\(\rho_d\): dust correlation between 217 and 353 GHz; not included when “Decorrelation model” is set to “none”

For the decorrelation model, we assume that the cross-spectrum of dust between frequencies \(\nu_1\) and \(\nu_2\) is reduced by factor \(\exp\{ - (1-\rho) \times [\log^2(\nu_1 / \nu_2) / \log^2(217 / 353)] \times f(\ell)\}\). For the \(\ell\) dependence, we try out uniform (flat), linear, or quadratic scaling (pivot scale is \(\ell\)=80).

The Figure 1 pager shows histograms of the recovered maximum likelihood parameters for all sky models, four delensing options (residual \(A_L\) = 1.0, 0.3, 0.1, or 0.03), two choices of input tensor signal (\(r\) = 0 or 0.003), and four options for the dust decorrelation model used in the analysis. Here I will highlight the main results for each sky model.

00: Gaussian foregrounds

The mean values and standard deviations of \(r\) for simulations with simple Gaussian foregrounds are summarized in Table 00. With a 10% lensing residual, we don't quite achieve \(\sigma(r) = 5 \times 10^{-3}\) for sims with \(r = 0\).

We see a large (\(\sim \sigma / 2\)) negative bias for the \(A_L\) = 1 case, but the bias scales proportionally with \(A_L\), which suggests that the problem is due to a slightly different lensing \(BB\) spectrum being used for the sims vs the analysis. I thought that we already checked that these matched, but perhaps didn't check at high enough precision; should be an easy fix.

Turning on dust decorrelation in the model doesn't cause any bias in \(r\) and the recovered \(\rho_d\) values are centered around 1 (i.e. analysis recovers zero decorrelation). Adding this parameter does increase \(\sigma(r)\) somewhat. The hit is largest for flat scaling in \(\ell\) and smallest for quadratic scaling, because steeper scaling in \(\ell\) means that high \(\ell\) bandpowers can constrain the amount of dust decorrelation at \(\ell \sim 80\). This seems fairly artificial, so we shouldn't put too much stock in one version or another of the decorrelation model.

Table 00:

Mean \(r \times 10^3\) and \(\sigma(r) \times 10^3\) from sets of 500 realizations with simple Gaussian foregrounds.

Decorrelation model	\(A_L\) = 1	\(A_L\) = 0.3	\(A_L\) = 0.1	\(A_L\) = 0.03
Input \(r\) = 0
none	-1.437±2.637	-0.409±1.033	-0.136±0.537	-0.049±0.360
flat	-1.544±2.753	-0.443±1.259	-0.145±0.764	-0.069±0.575
linear	-1.433±2.592	-0.350±1.101	-0.068±0.673	-0.011±0.515
quadratic	-1.380±2.596	-0.348±1.023	-0.081±0.552	-0.016±0.389
Input \(r\) = 0.003
none	1.313±2.713	2.406±1.249	2.7377±0.765	2.874±0.574
flat	1.293±3.134	2.465±1.643	2.834±1.083	2.964±0.848
linear	1.395±2.920	2.512±1.499	2.838±1.005	2.931±0.792
quadratic	1.400±2.749	2.484±1.298	2.801±0.822	2.910±0.633

01: PySM a1d1s1f1

As has been previously noted, dust power is much higher in this model (\(A_d \sim 12.5 \mu K^2\)) than for the Gaussian foreground sims (\(A_d = 3.75 \mu K^2\)). The PySM d1 dust model does feature a spatially varying spectral index, but we don't find any detectable decorrelation in this analysis. The PySM s1 synchrotron model yields \(A_s \sim 0.5 \mu K^2\) and there is \(\sim 6\)% correlation between dust and sync.

We still see a (negative) bias on \(r\) that is proportional to the residual lensing power but there is an additional positive bias on \(r\), presumably due to foregrounds. Judging by the \(A_L = 0.03\) results, this foreground bias corresponds to \(r \sim 5 \times 10^{-4}\). This bias perhaps goes away when we include dust decorrelation with flat \(\ell\) scaling, and we see \(\sim 1 \sigma\) evidence for dust decorrelation in that case (\(\rho_d = 0.9995 \pm 0.0005\)).

Table 01:

Mean \(r \times 10^3\) and \(\sigma(r) \times 10^3\) from sets of 500 realizations with PySM a1d1f1s1 foregrounds.

Decorrelation model	\(A_L\) = 1	\(A_L\) = 0.3	\(A_L\) = 0.1	\(A_L\) = 0.03
Input \(r\) = 0
none	-0.515±2.456	0.394±0.979	0.539±0.542	0.5338±0.378
flat	-0.671±2.838	0.089±1.35	0.086±0.852	0.022±0.648
linear	-0.481±2.680	0.440±1.242	0.511±0.780	0.418±0.578
quadratic	-0.426±2.488	0.507±1.028	0.610±0.590	0.546±0.422
Input \(r\) = 0.003
none	2.094±2.465	3.124±1.103	3.372±0.705	3.454±0.554
flat	1.994±2.833	2.979±1.473	3.146±1.007	3.170±0.820
linear	2.214±2.617	3.290±1.306	3.454±0.902	3.413±0.733
quadratic	2.237±2.474	3.296±1.130	3.482±0.743	3.482±0.598

02: PySM a2d4f1s3

The d4 version of PySM dust adds a second dust component (with different blackbody temperature and emissivity power law) based on Meisner & Finkbeiner (2014). I'm not sure what type of \(\beta_d\) spatial variations are included in this model, but I think it is more or less the same as for d1. The s3 synchrotron model adds curvature to the synchrotron spectral index: \(\beta_s \rightarrow \beta_s + C \ln (\nu / \nu_C)\). The a2 AME model uses a 2% polarization fraction for AME, which seems very high, but there is no attempt to model AME in this analysis.

Results for this model show that \(A_d\) is even larger (\(\sim 32.5 \mu K^2\)) than for the d1 dust model. The mean value of \(\beta_d\) decreases from 1.59 (for PySM d1 model) to 1.55, which is probably a sign of the two component dust. The mean value of \(\beta_s\) decreases from -3.05 (for PySM s1 model) to -3.13, which is probably due to synchrotron spectral curvature (and perhaps polarized AME?). Dust–sync correlation is higher, at \(\sim 10\)%, which could be from polarized AME.

The pattern of \(r\) positive biases is basically the same as for the PySM a1d1f1s1 model, including the observation that the bias is minimized for a model that includes decorrelation with flat \(\ell\) scaling. However in this case the significance of \(\rho_d \ne 1\) is down to only \(\sim 0.6 \sigma\).

Table 02:

Mean \(r \times 10^3\) and \(\sigma(r) \times 10^3\) from sets of 500 realizations with PySM a2d4f1s3 foregrounds.

Decorrelation model	\(A_L\) = 1	\(A_L\) = 0.3	\(A_L\) = 0.1	\(A_L\) = 0.03
Input \(r\) = 0
none	-0.988±2.278	0.047±0.925	0.273±0.521	0.323±0.363
flat	-1.317±2.583	-0.209±1.266	-0.008±0.802	0.016±0.604
linear	-1.141±2.401	0.018±1.123	0.245±0.710	0.238±0.527
quadratic	-0.968±2.276	0.109±0.947	0.311±0.550	0.315±0.393
Input \(r\) = 0.003
none	1.949±2.763	2.930±1.182	3.184±0.719	3.275±0.556
flat	1.864±3.055	2.864±1.542	3.084±1.072	3.126±0.897
linear	1.902±2.900	2.965±1.398	3.189±0.937	3.193±0.754
quadratic	1.990±2.777	3.006±1.232	3.224±0.781	3.254±0.617

03: PySM a2d7f1s3

The next PySM version uses the Hensley/Draine dust model, which has additional complexity in the dust SED (perhaps described in arXiv:1709.07897?). The level of dust power is similar to sky model 01 (PySM d1 model), but we find that the emissivity power law is even flatter than the last case, with \(\beta_d \sim 1.44\).

If we focus on the \(A_L = 0.03\) column, we find that the decorrelation model with flat \(\ell\) scaling produces a negative bias in this case and we find the smallest bias for decorrelation with linear \(\ell\) scaling. The significance of \(\rho_d \ne 1\) is a little more than \(1 \sigma\) for both of these options.

Table 03:

Mean \(r \times 10^3\) and \(\sigma(r) \times 10^3\) from sets of 500 realizations with PySM a2d7f1s3 foregrounds.

Decorrelation model	\(A_L\) = 1	\(A_L\) = 0.3	\(A_L\) = 0.1	\(A_L\) = 0.03
Input \(r\) = 0
none	-0.670±2.448	0.324±0.962	0.502±0.540	0.510±0.388
flat	-1.745±2.925	-0.594±1.394	-0.370±0.883	-0.331±0.680
linear	-1.433±2.760	-0.193±1.287	0.113±0.811	0.148±0.605
quadratic	-0.869±2.518	0.251±1.029	0.461±0.595	0.447±0.430
Input \(r\) = 0.003
none	2.154±2.591	3.151±1.144	3.381±0.721	3.447±0.569
flat	1.137±3.294	2.343±1.700	2.643±1.130	2.722±0.895
linear	1.377±3.026	2.644±1.535	3.001±1.020	3.074±0.799
quadratic	1.938±2.682	3.068±1.237	3.324±0.800	3.356±0.635

04: Ghosh dust model

The Ghosh dust model (described here) is based on GASS HI data with a model for the Galactic magnetic field. For these sims, it is combined with the PySM a2, f1, and s3 components (same as the two previous models).

The analysis of this model yields smaller still values of \(\beta_d \sim 1.3-1.4\) and makes me pretty worried about the Planck-derived \(\beta_d\) prior that was used for this analysis—these values are disfavored by the prior at \(\sim 2.5 \sigma\)! While CMB-S4 bandpowers are so sensitive that I think the likelihood should dominate over the \(\beta_d\) prior, it seems very possible that this level of disagreement between sims and prior could push results around (leading to bias on \(r\)). Dust-sync correlation is still present, but smaller (2–3%), which is probably due to the fact that the Ghosh dust sims don't know anything about the PySM synchrotron or AME components. The fact that they are correlated at all probably happens because both models are based on data at larger scales.

Dust decorrelation is small in absolute terms, but detected at high significance. Using a model without dust decorrelation leads to a large positive bias on \(r\) in the range \(4-5 \times 10^{-3}\). Dust decorrelation with linear \(\ell\) scaling produces the smallest biases, but still quite large compared to other sky models. Also note that the recovered \(\beta_d\) parameters shift around depending on the decorrelation \(\ell\) scaling, which doesn't help my confidence in these results.

Table 04:

Mean \(r \times 10^3\) and \(\sigma(r) \times 10^3\) from sets of 500 realizations with Ghosh dust model plus PySM a2f1s3 foregrounds.

Decorrelation model	\(A_L\) = 1	\(A_L\) = 0.3	\(A_L\) = 0.1	\(A_L\) = 0.03
Input \(r\) = 0
none	1.238±2.602	2.762±1.224	3.666±0.824	4.445±0.684
flat	-1.818±3.097	-0.844±1.644	-1.794±1.163	-3.102±0.909
linear	-1.032±2.786	-0.051±1.401	-0.403±0.928	-0.751±0.724
quadratic	0.682±2.602	1.960±1.229	2.072±0.798	2.001±0.622
Input \(r\) = 0.003
none	4.461±2.955	6.037±1.431	7.162±1.009	8.184±0.898
flat	1.113±3.247	2.045±1.836	1.028±1.411	-0.329±1.166
linear	1.947±3.062	2.964±1.599	2.655±1.131	2.304±0.940
quadratic	3.817±2.946	5.109±1.446	5.283±0.996	5.240±0.832

05: Gaussian decorrelated dust model

I only had 100 realizations (50 each for \(r\) = 0, 0.003) to analyze for this Gaussian decorrelated model, so the statistics aren't great. However, this model has extremely large dust decorrelation (15% between 217 and 353 GHz at \(\ell\) = 80) and it exactly follows the assumed functional form of decorrelation with linear \(\ell\) scaling, so we can still draw some useful conclusions.

When we choose decorrelation with linear \(\ell\) scaling to match the sims, then we find no bias on \(r\) and recover \(\rho_d\) = 0.85. If we use the wrong \(\ell\) scaling for decorrelation, then we find positive biases on \(r\) in the range \(1-3 \times 10^{-3}\). I didn't bother including in the table the results with no decorrelation model because they are crazy.

An important point to note from this model is that, even for the unbiased case where the decorrelation is correctly modeled in both \(\nu\) and \(\ell\), we find \(\sigma(r) \sim 1.4\), much larger than the target sensitivity of CMB-S4. This shows that, for extreme levels of foreground decorrelation, we lose the ability to clean foregrounds from the maps because the foreground modes are significantly independent between the various CMB-S4 frequencies. Regardless of whether you are doing map-based cleaning or fitting the power spectra as we do here, the only way to improve sensitivity would be use more observing bands that are more closely spaced. It also makes the point that our Fisher forecasts should assume some non-zero level of decorrelation. Adding decorrelation as a free parameter to a forecast that assumes \(\rho_d = 1\) only captures part of the statistical penalty.

Table 05:

Mean \(r \times 10^3\) and \(\sigma(r) \times 10^3\) from sets of 50 realizations with Gaussian decorrelated dust foreground.

Decorrelation model	\(A_L\) = 1	\(A_L\) = 0.3	\(A_L\) = 0.1	\(A_L\) = 0.03
Input \(r\) = 0
flat	-0.988±3.294	0.277±1.825	0.977±1.467	1.378±1.403
linear	-2.522±3.196	-0.953±1.809	-0.312±1.459	-0.038±1.361
quadratic	1.334±3.115	2.482±1.754	2.766±1.420	2.876±1.336
Input \(r\) = 0
flat	2.537±3.121	3.971±2.083	4.615±1.783	4.958±1.694
linear	1.021±2.810	2.599±1.824	3.145±1.530	3.324±1.416
quadratic	4.915±2.809	5.983±1.825	6.191±1.480	6.238±1.325

06: Flauger MHD foregrounds

My understanding is that this model uses MHD simulations to consistently model polarized dust and synchrotron in the Galactic magnetic field. This makes it quite interesting that this analysis finds negative dust-sync correlation with \(\epsilon \sim -0.34\). The dust power is similar to the Gaussian sims, and \(\beta_d\) matches the Planck value of 1.59. This analysis finds a synchrotron SED power law that is much flatter than usual, \(\beta_s \sim -2.6\), which is inconsistent with the prior at about \(1.5 \sigma\).

This model does not show any significant dust decorrelation and we recover \(r\) values near the sim input level for all choices of decorrelation model. In general, the results for this model look nearly as good as the simple Gaussian foregrounds (sky model 00).

Table 06:

Mean \(r \times 10^3\) and \(\sigma(r) \times 10^3\) from sets of 500 realizations with Flauger MHD foreground model.

Decorrelation model	\(A_L\) = 1	\(A_L\) = 0.3	\(A_L\) = 0.1	\(A_L\) = 0.03
Input \(r\) = 0
none	-1.543±2.345	-0.416±0.952	-0.104±0.537	-0.002±0.373
flat	-1.491±2.788	-0.375±1.320	-0.081±0.816	0.002±0.600
linear	-1.346±2.537	-0.249±1.161	0.039±0.735	0.097±0.552
quadratic	-1.378±2.364	-0.294±0.977	-0.013±0.571	0.057±0.411
Input \(r\) = 0.003
none	1.364±2.775	2.465±1.233	2.796±0.757	2.923±0.576
flat	1.476±3.167	2.580±1.599	2.904±1.085	3.011±0.877
linear	1.599±2.993	2.649±1.472	2.945±0.993	3.023±0.794
quadratic	1.559±2.819	2.596±1.286	2.889±0.816	2.981±0.636

Figure 1:

Histograms of maximum-likelihood parameters recovered for simulations. For most sky models, there are 500 realizations with \(r\)=0 and 500 realizations with \(r\)=0.003; for sky model 05, there are only 50 realizations of each.

Appendix A: Results for CDT report

As can be seen in Table 00, we seem to have a negative bias even in the case of the Gaussian foreground simulations. This is being investigated, by changing the lensing power spectrum, and removing the priors on \(\beta_d\), and \(\beta_s\), as well as extending the allowed range of \(A_d\) for some models. Work is also on-going to see if these biases can be tracked down to specific bandpowers.

For the CDT report, it was decided to remove this "algorithmic bias" to focus on the bias produced by the different dust simulations. We also chose to report results using the linear \(\ell\) dependence for the decorrelation model. See caption of Table 07.

Table 07:

Bias on \(r\), obtained by subtracting the "algorithmic bias" to the mean of \(r\) for each of the 6 complex foreground models, for the case \(A_L\) = 0.1, assuming linear decorrelation for the Gaussian model. The "algorithmic bias" is estimated from the Gaussian simulations (-0.068±0.673) (see Table 00). For this Gaussian foreground case, we report a bias based on the absolute value of the sample variance on the mean for 1000 sims, which acknowledges statistical limitations exist even for closed-loop tests calibrated by MC sims.

Input \(r\) = 0
	\(r\) bias \(\times 10^4\)	\(\sigma(r) \times 10^4\)
02.00	0.2	6.7
02.01	5.8	7.8
02.02	3.1	7.1
02.03	1.8	8.1
02.04	-3.4	9.3
02.05	-2.5	14
02.06	1.1	7.3
Input \(r\) = 0.003
02.00	0.3	10
02.01	6.2	9.0
02.02	3.5	9.4
02.03	1.6	10
02.04	-1.8	11
02.05	3.0	15
02.06	1.1	9.9

Appendix B: DC3-DC2 biases from the ML search (cross check with CDT reported numbers)

We also ran the ML search on the DC3 simulations.

Here again, we only present the results for \(A_L\) = 0.1, but both for the linear decorrelation and no decorrelation case.

The numbers used in the current draft of the CDT report to show the impact on r recovery of introducing systematics into sims (table 9) are based on the method used by Steve (see e.g. here in a post using sky model 0). However, the other results on r from the parametric analyses shown in the report come from ML searches, as in this posting, so we wish to check whether these methods produce consistent numbers.

The results are obtained by subtracting DC2 results from the DC3, for sky model 3. Note that for DC3, the 1000 simulations were divided in subsamples to which specific systematics were added (see experiment definition), except for the first 125 (case "none"), which are thus the same as in DC2.

For instance, for sky model 3, we take all the even simulations within (0125-0249), which have a fiducial \(r\) = 0.003, and run the analysis both for DC3 and DC2. We then take the difference of the means. These results are reported in Table 08. We highlighted what corresponds to CDT's table 9, i.e. \(r\)=0, for linear decorrelation.

Table 08:

Systematic bias on \(r\) for sky model 3, obtained by subtracting the results for DC2 to DC3 for the case \(A_L\) = 0.1, both for the case of "linear" decorrelation and for no decorrelation in the search. the reported \(\sigma(r) \) is obtained from the DC3 simulations.

Decorrelation model: Linear
Systematics	\(r\) bias \(\times 10^4\)	\(\sigma(r) \times 10^4\)
Input \(r\) = 0
none	0	7.9
uncorrelated, white spectrum	-1.89	7.9
uncorrelated, 1/ell spectrum	0.75	7.5
correlated, white spectrum	-0.18	8.5
correlated, 1/ell spectrum	0.78	7.6
uncorrelated, white + 1/ell spectrum	-0.55	8.1
correlated, white + 1/ell spectrum	0.33	7.9
uncorrelated + correlated, white + 1/ell spectrum	-0.01	8.3
Input \(r\) = 0.003
none	0	10.6
uncorrelated, white spectrum	-2.02	11.2
uncorrelated, 1/ell spectrum	0.46	10.7
correlated, white spectrum	0.11	10.1
correlated, 1/ell spectrum	0.82	7.2
uncorrelated, white + 1/ell spectrum	-0.85	10.4
correlated, white + 1/ell spectrum	0.3	10
uncorrelated + correlated, white + 1/ell spectrum	-0.12	10.2
Decorrelation model: None
Input \(r\) = 0
none	0	4.8
uncorrelated, white spectrum	0.33	6
uncorrelated, 1/ell spectrum	0.96	5.3
correlated, white spectrum	0.85	6
correlated, 1/ell spectrum	0.99	5.3
uncorrelated, white + 1/ell spectrum	0.67	5.6
correlated, white + 1/ell spectrum	0.83	5.1
uncorrelated + correlated, white + 1/ell spectrum	0.82	5.6
Input \(r\) = 0.003
none	0	7.6
uncorrelated, white spectrum	0.55	7.2
uncorrelated, 1/ell spectrum	0.85	7.2
correlated, white spectrum	1.13	6.9
correlated, 1/ell spectrum	0.93	7.1
uncorrelated, white + 1/ell spectrum	0.62	7.1
correlated, white + 1/ell spectrum	0.93	6.8
uncorrelated + correlated, white + 1/ell spectrum	1.01	8.5