Practical CRLB Frequency Estimation

Keeping it simple

Often times in signal processing, estimating the center frequency of a digital complex baseband signal is a crucial ingredient in topics spanning engineering, physics, and math. There are a lot of beautiful approaches to measuring center frequency that deserve their own discussion another time.

The write-up I’m presenting here is a list of cheap and easy favorites that I use to estimate center frequency of a single clean complex baseband tone. All of these techniques have a time and a place. Some of them are convenient and achieve expected results, some of them have a couple more steps and achieve the highest quality result: Cramér Rao Lower Bound (CRLB).

A phenomenal source of my own continuing education and excellent base for estimation theory can be found at Professor Fowler’s Estimation theory course website [1]. Another resource I like

Framing the problem

We assume we are looking at a baseband collected complex sinusoid of the following model which declares our measured data \(x[n]\) and our unknown model parameters \(A, f, \phi\).

\[\begin{align} x[n] &= A \exp{j \left(2 \pi f n + \phi\right)} + w[n], n \in [0,1,2,...,N-1]\\ A & \in \mathbb{R} > 0 \\ f & \in [-1/2, 1/2] \\ \phi & \in [-\pi, \pi] \\ w[n] & \sim \mathcal{CN}\left(0, \sigma \right) \end{align}\]

We note that our estimates will have to extract information in the presence of complex Gaussian noise. The strength of this noise will be defined by measured by signal to noise ratio \(A^2 / \sigma^2\). In these problems, unknown variables often depend on each other and can contribute in difficulty to determining any one of them. In this particular problem in my experience however, estimation of \(A,\phi\) is very often trivial in comparison to determining \(f\) and are basically determined for free once \(f\) has been solved for. With that, we examine a few approaches I like to use for solving for \(f\). For convenience later on we will frequently roll \(2\pi f\) into a single term \(\omega\), the angular frequency. To differentiate, I will typically refer to \(f\) as the cycle frequency, and \(\omega\) as the angular frequency. For all problems, we use the complex baseband assumptions that the units of time are simply samples, and the units of cyclic frequency are cycles per sample.

Mean-Squared Error

Mean-squared error (MSE) is basically the variance of error, and will be our primary metric for comparing the quality of recovery. The lower the mean-squared error, the better we’re doing. Consider an estimator operating over many experiments (independent parameters held equal) that is designed to measure the value \(X_i\) given some experiment index \(i\). The estimated value \(\hat{X_i}\) a is produced during each experiment and the quality of the estimator is measured with the MSE:

\[\begin{align} MSE_{\hat{X}} &= \frac{1}{K}\sum_{n=0}^{K-1}\left( X_i - \hat{X_i} \right)^2 \\ i & \in [0,1,...,K-1] \end{align}\]

This can also naturally extend into another useful metric, the Root-Mean-Squared Error (RMSE) which is represents the standard deviation of error of an estimator. We will occasionally also use the Median-Squared Error as a useful metric when looking at phase transition plots where error suddenly jumps. The median can sometimes reveal boundaries better than mean since over many trials the median is less swayed by outliers.

Numerical Test Parameters

Each of these estimators will have their MSE confirmed numerically across 50 trials over a range of signal lengths \(N\) and \(SNR\). We will take the MSE over the 50 trials and return results as an image over \(N\) and \(SNR\). We will also typically provide a comparison to some analytic function that predicts MSE in the form of a histogram of differences between numerical results and theoretical results (measured in dB).

Basic estimation: FFT

This one is really simple. Given a set of measurements \(x[n], n \in [0,1,...,N-1]\) we take the FFT and look for the peak absolute value and map that to frequency. In Python, the code that does this is very easy:

1
2
3
4
def simple_peak(data):
    F = np.fft.fft(data)
    freq = np.fft.fftfreq(len(data))
    return freq[np.argmax(np.abs(F))]

We can actually work out what the MSE of this technique should be before we even get started. The value of cycle frequency \(f\) is assumed to uniformly random: \(\mathcal{U}\left(-.5,.5\right)\). The FFT transforms \(N\) samples of signal uniformly spaced in time into \(N\) samples of uniformly spaced cyclic frequency taking values \(\left[0, 1/N, 2/N,...,(N/2-1)/N,-(N/2)/N, (-N/2 + 1)/N,...,-2/N,-1/N \right]\). Note I am using the even FFT bin labelings. Since we are selecting the peak frequency from a finite set of \(N\) bins spaced at \(1/N\), the error we face in estimating \(f\) is a uniform distribution of \(\mathcal{U}[-.5/N,.5/N]\) frequency bins. The variance of this distribution is \(\frac{1}{12 N^2}\) and should be the MSE of the FFT center frequency estimation. The results of the numerical test are shown here:

FFT MSE

Note the color range of the plot may seem off, but is meant to scale and compare to other estimators that will achieve higher precision. An important feature to notice about this plot is the section to the left of the blue line in the left-hand figure, a cliff where error is suddenly quite high compared to the other side of the line. This section breaks assumptions about the peak value in the frequency domain. To the left side of this line, we should not expect frequency estimators built on FFT to function because noise power in the FFT will begin to take values higher than the concentrated peak signal value. We analyze this phenomena deeper in the next section.

When do FFT methods succeed or fail?

This question will have a dominant effect on our processing throughout the rest of this article, this cliff from the previous example. This error cliff is caused by the sudden failure to recover the true frequency from an FFT by taking the maximum value, meaning that noise has produced peaks in the FFT larger than the peak created by tone. Since we are unaware of the center frequency we are searching for, and these frequencies are not necessarily on the grid of the FFT frequency values, the height of the tone’s peak will also follow a random distribution. We will understand when FFT methods fail when we understand when amplitude values of the noise distribution reliably grow above the values of the tone energy distribution.

The noise is the more straight-forward of these two problems, noise generated in these simulations are drawn from a complex Gaussian, \(\mathcal{CN}(0,1)\). Upon drawing \(N\) values from this distribution, their FFT will take on a Rayleigh magnitude distribution with CDF given bellow (\(\sigma\) here is standard deviation for each complex axis after FFT, \(\sigma^2 = N/(2SNR)\)):

\[\begin{align} F_w(x) = P(|w| \leq x) &= 1 - \exp(-x^2/(2\sigma^2))\\ &= 1 - \exp(-x^2 SNR/N)\\ \end{align}\]

The tone spectral distribution is a bit more nuanced. Since we’re going to be looking at magnitude (not mag-squared for reasons that will be discussed later), the frequency domain peak created by a tone will be the magnitude envelope of the sinc function. Since we care about the distribution of peak amplitudes, we consider that for any uniform randomly chosen center frequency \(f \in [-.5,.5]\). This means that the discrete values of the frequency domain will sample the magnitude sinc function with one sample being within a half sample of the peak. The nearest sample to the peak then will take on a magnitude value as a function of a random variable: \(Y = A N sinc(X), X \sim \mathcal{U}(0,.5)\). Technically, the uniform distribution could be over \([-.5,.5]\) but the distribution is symmetric and halving the domain of \(X\) doesn’t change the distribution of \(Y\). One thing it does do is make the function over \(X\) monotonic and therefore easier to calculate the PDF of \(Y\), which we will need for analyzing when the signal distribution is greater than the noise distribution. We note that since \(SNR\) is already a function of \(A\), we can set \(A=1\) without loss of generality and make some expressions more legible for this section. The PDF of a monotonic function of a random variable can be written as [2]:

\[\begin{align} f_Y(y) &= f_X(s(y)) \left| \frac{\partial s(y)}{\partial y} \right| \\ y &= r(x) \\ x &= r^{-1}(y) = s(y) \\ y & \in [r(x) \forall x \in X] \end{align}\]

In our case, \(y = r(x), r = A N sinc\), and the inverse function over our interval is \(x = sinc^{-1}(y / (A N))\). This leaves us with the PDF, (\(A==1\)):

\[\begin{align} f_Y(y) &= sinc^{-1}(y / N) \left| \frac{\partial sinc^{-1}(y/N}{\partial y} \right| \\ y & \in [ N sinc(.5), N sinc(0)] \end{align}\]

This function, while defined poorly from an analytical sense is at least well behaved numerically except for the derivative near \(y = 1\). To get around this complication, we compute the PDF numerically with histogram accumulation of \(y = sinc(x)\) where \(x\) is a large number of samples drawn from \(x \sim \mathcal{U}(0,.5)\).

Armed now with the CDF of our noise distribution and the PDF of our signal distribution, we can integrate the CDF over the PDF over the range of the PDF to determine the probability that the signal energy is greater than the noise energy. We recall that our CDF is for a single draw of noise energy in the frequency domain, so we will need to draw \(N\) times. Since our noise energy is independent, this effectively becomes raising the CDF for noise to the \(N\) power. For independent variables \(N,SNR\), our expression for probability becomes:

\[\begin{align} P(Y > X) &= \int_{-\infty}^{\infty} F_X(y) f_Y(y) dy \\ &= \int_{N sinc(.5)}^{N sinc(1)} \left(1 - \exp(-y^2 SNR/N) \right)^N sinc^{-1}(y / N) \left| \frac{\partial sinc^{-1}(y/N)}{\partial y} \right| dy \end{align}\]

I compute this integral numerically over the same values used in experiments in the last section and throughout the rest of the article in the image bellow. We note that we primarily conduct numerical tests with 50 trials for each \(SNR, N\), and a single missed detection skews the MSE in log scale tremendously. Therefore I show the image of probability thresholded to \((50-1)/50\) to compare with the other recovery trials throughout. This thresholded image recreates the frequency recovery failure mode when noise has produced a frequency sample with energy greater than the signal.

Probability that signal energy concentrated in one bin is greater than all noise frequency bins given \(SNR, N\)

Perhaps more useful generically, we can also generate a sampled image much larger than the range of \(SNR, N\) sampled in the other experiments here. This lets us come up with a few rules of thumb for when we should expect to be able to recover frequency in general. Here we test several line candidates:

Several lines use to model the lower bound on \(N\) needed to recover a tone via FFT at a given \(SNR\)

These lines fit decently, but are only a rule of thumb and are not rigorously derived. The line \(N > 16/SNR^{1.08}\) seems to provide the tighter lowest bound, but provides a function that’s pretty intolerable. The line \(N > 64/SNR\) is not as tight but is fairly trivial to remember.

The free improvement, Parabolic fitting

When taking an FFT of a tone, it’s not strictly correct to say that all the signal energy collects into a single frequency bin. The frequency peak is actually spread over a sinc function that centers on the true continuous frequency value \(f\). When \(f\) is perfectly on a frequency bin center of the FFT, then all the signal energy is contained in a single bin, but any deviation from this and energy is scattered across many bins following a sinc function. This means that multiple bins contain information about the frequency peak and we can use that to our advantage. We also don’t really have to pay for this information, since we’re already taking the FFT, this scattered energy is always available to use.

Fitting a sinc function to data sounds like the strongest path forward, but it has diminishing returns. As noise energy increases, the values of the sinc function far away from the peak will be completely covered, and attempting to fit to theses contaminated points may be counter-productive. But for the points immediately left and right of the peak, there’s use in attempting to fit a parabola to find a more accurate frequency peak. The following code first estimates the peak with a max, then finds the values left and right of the peak (subject to circular wrap) and solves for the point where the parabola that fits these points would have a slope of zero.

1
2
3
4
5
6
7
8
9
10
11
def parabolic(data):
    F = np.abs(np.fft.fft(data))
    freq = np.fft.fftfreq(len(data))
    middle = np.argmax(F)
    left  = (middle-1) % len(data)
    right = (middle+1) % len(data)
    L = F[left]
    M = F[middle]
    R = F[right]
    shift = .5 * (L - R) / (L - 2*M + R)
    return freq[middle] + shift/len(data)

When tested over \(SNR,N\), the MSE surface has the same shape as the surface generated by basic estimation as shown by the consistent ratio of the surface to \(1/(12N^2)\). The difference here is that the parabolic fit has a smaller variance than the basic FFT estimation by approximately 4.58 dB.

Parabolic MSE.

The cliff of error regarding the selection of initial FFT peak is still present as is to be expected. Using additional points to solve for the frequency peak complicates how the method degrades with \(SNR\), this complication is not explored here, but empirically, we see the performance improve with a negligible cost.

Note here that the parabola is fit to the modulus (absolute value) of the FFT NOT the modulus squared. This is because the sinc function between \([-.5,.5]\) samples is much better modeled by a parabola than a squared sinc function. Bellow, we compare a sinc function and a squared sinc function to the parabolas that are created by fitting the left, right, and center points. At both a half sample and a full sample away, the parabola most closely fits the sinc, not the squared sinc.

Three-point parabolas compared to their generating curves for sinc and squared sinc at half sample radius.

Three-point parabolas compared to their generating curves for sinc and squared sinc at a whole sample radius.

When we gradually increase the sample radius and fit to sinc and squared sinc, the MSE to the generating function from the three-point fit parabola is shown bellow.

Three-point parabolas compared to their generating curves (MSE) for sinc and squared sinc for various radii

Clearly, sinc is better suited to approximation than a squared sinc.

Sinc Interpolation

This technique is a dead-simple way to improve the frequency resolution of an FFT estimation: sinc interpolation. The idea here is to pad the original signal with zeros before taking an FFT. The resolution of the frequency domain is increased using a sinc function as the interpolation kernel. This is of course at the expense of computing larger FFT. Code that performs this simple operation is shown bellow.

1
2
3
4
5
def sinc_interp(data,interp = 2):
    wrap = np.hstack([data, np.zeros(len(data) * (interp-1))])
    F = np.fft.fft(wrap)
    freq = np.fft.fftfreq(len(wrap))
    return freq[np.argmax(np.abs(F))]

While not practical for large scale or real-time applications, this method is a favorite for quick analysis. Bellow we observe the improvement of frequency estimate variance as we increase the degree of interpolation (variable “interp” in code).

FFT frequency estimation using zeros to increase the length of the signal to two times the original signal length

FFT frequency estimation using zeros to increase the length of the signal to four times the original signal length

FFT frequency estimation using zeros to increase the length of the signal to eight times the original signal length

FFT frequency estimation using zeros to increase the length of the signal to sixteen times the original signal length

We observe that a majority of the MSE improves linearly with the increase to FFT length. We can approximate the MSE of FFT sinc interpolation with the expression \(\frac{1}{(kN)^2}\), where \(k\) is the variable “interp” in the code implementation.

We can see something very fascinating however as we increase the degree of interpolation. If we were able to endlessly increase resolution by increasing interpolation, the distribution that compares experimental and theoretical MSE limit would remain tight. However we see that the distribution actually begins to smear as we increase interpolation; some experiments don’t improve as much as we would expect them to. We can actually see this on the map of MSE near 0dB \(SNR\) for small values of \(N\). The MSE there does not appear to behave like the \(\frac{1}{N^2}\), it appears increase in MSE as we get nearer to the FFT error cliff. We have hit a sort of MSE bedrock here. To understand why, we need to understand the estimation lower-bound: The Cramér-Rao Lower Bound.

Cramér-Rao Lower Bound

This topic definitely deserves its own set of articles, but to have an idea of what it means to achieve a high-precision result, we will derive here the CRLB for the frequency estimation problem. For the purposes of this discussion the CRLB represents the best that any unbiased estimator could hope to do on average. An estimator may get lucky now and again and achieve an estimate closer to the ``correct’’ answer, but given noise and a function dependent on unknown parameters, the CRLB is the lowest MSE we can hope to achieve for any estimator used to determine these parameters.

[1]M. Fowler, “EECE 522 Estimation Theory,” 2020, Accessed: 4 AD. [Online]. Available at: https://ws.binghamton.edu/fowler/fowler%20personal%20page/EE522.htm
[2]L. Wasserman, All of Statistics: A Concise Course in Statistical Inference. New York: Springer Science+Business Media, Inc, 2004.