Fixing a mistake in bandpower binning in the CMB-S4 analysis code

Clem identified an error in how I was binning \(D_ℓ\) spectra into bandpowers in the CMB-S4 analysis code. Here I explain what the mistake was and how to fix it.

Description of problem

Calculating bandpower values in the standard pipeline from Fourier modes \(a_{uv}\) occurs in calcspec(). For standard usage which takes uniform weighting of all modes in the Fourier plane, this translates mathematically to

\begin{align} D_b &= \frac{1}{N_b} \sum_{\llo ≤ ℓ < \lhi} \dlsf \left| a_{uv} \right|^2 \qquad \text{where} \qquad \left\{ \begin{aligned} ℓ &= 2π \sqrt{u^2 + v^2} \\ N_b &= \text{number of modes in the sums} \end{aligned} \right. \end{align}

where a bandpower bin \(b\) is defined by the bin edges \(\llo\) and \(\lhi\). In the context of using s2hat for CMB-S4 analysis, we replace the Fourier transform with spherical harmonic transforms that provide harmonic modes \(a_{ℓm}\). We can then acknowledge that \(ℓ\) and \(m\) are integers and rewrite the expression above as

\begin{align} D_b &= \frac{1}{N_b} \sum_{ℓ=\llo}^{\lhi-1} \dlsf \sum_{m=-ℓ}^ℓ \left| a_{ℓm} \right|^2 \label{eqn:bandpower} \end{align}

In the BK pipeline, maps are transformed and binned directly to bandpowers. For a reason I don't recall (maybe just an arbitrary choice that persisted), in the CMB-S4 analysis, I instead set up all the processing to calculate unbinned power spectra. To feed into the rest of our standard analysis, a post-processing step is required to mainly package up the spectra into standard pipeline data structures and files, and it is at this step that the spectra are binned.

Clem found, though, that a mistake was made in what I had coded. While I did correctly scale \(D_ℓ = ℓ(ℓ+1) C_ℓ / 2π\) before binning, I incorrectly took an average of the \(D_ℓ\) values within each bandpower bin:

\begin{align} D_b ≠ \frac{1}{\lhi-\llo} \sum_{ℓ=\llo}^{\lhi-1} D_ℓ \end{align}

This incorrectly weights low \(ℓ\) spectra more than higher \(ℓ\). If instead you start from Eq. \(\ref{eqn:bandpower}\) and manipulate to an expression in terms of \(D_ℓ\), you find that there is a missing factor of \((2ℓ+1)\) to account for the number of modes in each \(D_ℓ\).

\begin{align} D_b = \frac{1}{N_b} \sum_{ℓ=\llo}^{\lhi-1} (2ℓ + 1) D_ℓ \qquad\text{where}\qquad N_b = \sum_{ℓ=\llo}^{\lhi-1} 2ℓ + 1 = \lhi^2 - \llo^2 \end{align}

In the course of fixing this, I also found a couple more code errors:

There's an error in building an array of \(ℓ\) values (due to forgetting that a list from 0 to \(N\) has length \(N + 1\)). Why this didn't cause a runtime error, though, is still a mystery to me... The logical mask built from inequalities on this vector will have length one greater than the dimension of the matrix being operated upon, but Matlab accepted it without any errors.
```
--- a/cl2bandpowers.m   2018-12-08 14:26:54.525093975 -0500
+++ b/cl2bandpowers.m   2018-12-08 14:53:13.859042052 -0500
@@ -22,7 +22,7 @@
 %

 nbins = length(binedges) - 1;
-l = 0:size(Cl,1);
+l = 0:(size(Cl,1)-1);

 nd = ndims(Cl);
 sz = size(Cl);
```
Figure 2.1 will show that this had absolutely no impact on the computed results.

There was also an off-by-one error wherein the highest \(ℓ\) to be included in the bin was excluded from the sum; the index of the upper bin edge was already decremented, but then a less than sign was also used.


--- a/cl2bandpowers.m   2018-12-08 14:53:13.859042052 -0500
+++ b/cl2bandpowers.m   2018-12-08 14:32:03.560430038 -0500
@@ -36,7 +36,7 @@
   be = binedges(ii+1) - 1;
   bc(ii) = 0.5 * (bs + be);
   % incorrect to use straight mean here - see comments in calcspec.m
-  bp(ii,colons{:}) = mean(Cl(bs<=l&l<be,colons{:}), 1);
+  bp(ii,colons{:}) = mean(Cl(bs<=l&l<=be,colons{:}), 1);
 end

 end

Interestingly, though, I found a similar style of mistake exists in calcspec():


% loop over bins
for i=1:nbin
  ind=find(x>=binedge(i)&x<=binedge(i+1));

One of the two comparisons should be a strict inequality to avoid a mode on the bin edge from being double counted. In practice, this occurs rarely, though, since the \(ℓ\) values from a Fourier transform are generally not integers. An empirical test for our standard map padding shows only 3 modes (out of 65536) are double counted:


>> m = get_map_defn('bicep');
>> map.Tvar = sparse(m.ny, m.nx);
>> p = 2^nextpow2(m.nx);
>> [m,map] = pad_map(m, map, p);
>> ad = calc_ad2([m.xdos,m.ydos], [m.nx,m.ny]);
>> bins = get_bins('bicep_norm');
>> ad.u_l = 2*pi * ad.u_r;
>>
>> idx = find(cvec(bsxfun(@eq, cvec(ad.u_l), rvec(bins))));
>> [l,b] = ind2sub([numel(ad.u_l), numel(bins)], idx);
>> [vi,ui] = ind2sub(size(ad.u_l), l);
>>
>> tmp.l  = ad.u_l(l);
>> tmp.lx = 2*pi * ad.u_val{1}(ui)';
>> tmp.ly = 2*pi * ad.u_val{2}(vi)';
>> tmp.edge = b;
>> struct2table(tmp)

ans =

  3x4 table

    l     lx    ly     edge
    __    __    ___    ____

     0    0       0     1
    90    0     -90     4
    90    0      90     4

By chance these all end up being in the poly/gs trench, so this fact ends up having no impact.

Finally, we fix the weighting factors used to bin the spectra.


--- a/cl2bandpowers.m   2018-12-08 14:32:03.560430038 -0500
+++ b/cl2bandpowers.m   2018-12-08 17:50:11.497393603 -0500
@@ -31,12 +31,17 @@
 bc = zeros([nbins 1]);
 bp = zeros([nbins sz(2:end)]);

+% Weight each Cl by number of alm modes
+w = cvec(2*l + 1);
+Cl = bsxfun(@times, Cl, w);
+
 for ii=1:nbins
   bs = binedges(ii);
   be = binedges(ii+1) - 1;
+  mask = bs <= l & l <= be;
   bc(ii) = 0.5 * (bs + be);
-  % incorrect to use straight mean here - see comments in calcspec.m
-  bp(ii,colons{:}) = mean(Cl(bs<=l&l<=be,colons{:}), 1);
+  bp(ii,colons{:}) = sum(Cl(mask,colons{:}), 1) ./ sum(w(mask));
+  assert(sum(w(mask)) == (be+1)^2 - bs^2);
 end

 end

Fixing a mistake in bandpower binning in the CMB-S4 analysis code

Description of problem

Comparison of BPWFs

Linked previous postings