October 26, 2023
This document summarizes an exploration into the idea that the distribution of prime numbers might be understood through the lens of signal processing, specifically as an "undersampling problem." Using the explicit formula relating the prime measure $\Lambda(n)$ to the zeros of the Riemann zeta function $\zeta(s)$, we develop a mathematical framework. We define an analytic signal $L(x)$ derived from the explicit formula and its sampled, truncated version $s_T(n)$. The total error $\Lambda(n) - s_T(n)$ is decomposed into two components: $E_{\text{trunc}}(n)$, representing structured noise from neglected high-frequency zeta zeros (the "undersampling noise"), and $E_{\text{mismatch}}(n)$, representing a fundamental gap between the discrete prime measure and the smooth analytic density. We characterize the statistical properties of $E_{\text{trunc}}(n)$, linking its moments (variance, skewness, kurtosis) to the conjectured Gaussian Unitary Ensemble (GUE) statistics of zeta zero spacings. The $E_{\text{mismatch}}(n)$ term is linked to prime pair correlation statistics (e.g., Hardy-Littlewood conjectures). A probabilistic model is proposed to estimate primality likelihood using $s_T(n)$ and the statistics of $E_{\text{trunc}}(n)$. The framework provides a rigorous articulation of the undersampling metaphor but highlights the inescapable reliance on zeta zeros and the persistent challenge of the $E_{\text{mismatch}}$ gap within this analytic approach. This framework may inform probabilistic primality testing or statistical models of prime distributions, bridging perspectives from analytic number theory and signal processing. We explicitly assume the Riemann Hypothesis (RH) for simplicity in many derivations.
The distribution of prime numbers exhibits both global regularity, described by the Prime Number Theorem (PNT) [1], and local irregularity, making prediction difficult. This exploration began with the provocative idea: "What if prime numbers are just an undersampling problem?" This metaphor draws from signal processing, where sampling a signal at a rate below the Nyquist frequency leads to aliasing---high-frequency components are misperceived as lower frequencies, distorting the signal. Similarly, the irregular appearance of primes might reflect an incomplete sampling or observation of an underlying analytic structure governed by the Riemann zeta function $\zeta(s)$.
This document aims to investigate this metaphor rigorously using the tools of analytic number theory. Our primary tool is the explicit formula, a cornerstone result connecting the distribution of primes (via the von Mangoldt function $\Lambda(n)$) to the non-trivial zeros of $\zeta(s)$ [2]. The goal is to develop a mathematical framework that:
Throughout this document, for simplicity in presenting the core ideas, we often assume the validity of the Riemann Hypothesis (RH), which states that all non-trivial zeros of $\zeta(s)$ lie on the critical line $\operatorname{Re}(s) = 1/2$. We will note where this assumption is made. The framework developed bridges perspectives from analytic number theory and signal processing, offering a novel lens on prime distributions.
We focus on the von Mangoldt function $\Lambda(n)$, which effectively measures the "prime power contribution" at integer $n$:
$$ \Lambda(n) = \begin{cases} \ln p & \text{if } n = p^k \text{ for some prime } p, k \ge 1 \\ 0 & \text{otherwise} \end{cases} \tag{1} \label{eq:lambda_def} $$Its summatory function, the Chebyshev function $\psi(x) = \sum_{n \le x} \Lambda(n)$, approximates the number of primes up to $x$ (related to PNT). The explicit formula connects $\psi(x)$ to the non-trivial zeros $\rho$ of $\zeta(s)$. A common (smoothed) version is [3]:
$$ \psi_0(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2} \ln(1 - x^{-2}) \tag{2} \label{eq:explicit_formula} $$where $\psi_0(x) = \lim_{\epsilon\to 0} \frac{\psi(x-\epsilon)+\psi(x+\epsilon)}{2}$ handles discontinuities at prime powers where $\Lambda(x) \neq 0$, the sum is over non-trivial zeros $\rho$, $\ln(2\pi)$ arises from $\zeta'(0)/\zeta(0)$, and the term involving $\ln(1-x^{-2})$ relates to the trivial zeros of $\zeta(s)$ at negative even integers.
Assuming RH, the non-trivial zeros are $\rho = 1/2 + i\gamma$ for $\gamma \in \mathbb{R}$.
To represent the local density of prime powers, we define an analytic signal $L(x)$ by formally differentiating the main terms of the explicit formula for $\psi(x)$. Assuming term-by-term differentiation is valid (which requires careful justification not detailed here):
$$ L(x) = \frac{d}{dx} \left( x - \sum_{\rho} \frac{x^\rho}{\rho} \right) = 1 - \sum_{\rho} x^{\rho-1} \tag{3} \label{eq:L_def_general} $$Assuming RH ($\rho = 1/2 + i\gamma$) and noting zeros come in conjugate pairs $\rho = 1/2 \pm i\gamma$ (for $\gamma \neq 0$), $L(x)$ simplifies to:
$$ L(x) = 1 - \sum_{\gamma} x^{-1/2 + i\gamma} = 1 - 2 \sum_{\gamma > 0} \operatorname{Re}(x^{-1/2 + i\gamma}) = 1 - 2 \sum_{\gamma > 0} x^{-1/2} \cos(\gamma \ln x) \tag{4} \label{eq:L_def_RH} $$(Note: Without RH, zeros $\rho = \beta + i\gamma$ with $\beta \neq 1/2$ would introduce terms $x^{\beta-1} \cos(\gamma \ln x)$, complicating the decay rate.)
In practice, we can only compute the sum using zeros with $|\gamma| < T$ for some truncation height $T$. This yields the truncated analytic signal:
$$ L_T(x) = 1 - 2 \sum_{0 < \gamma < T} x^{-1/2} \cos(\gamma \ln x) \tag{5} \label{eq:LT_def} $$Since $\Lambda(n)$ is defined on integers, we evaluate $L_T(x)$ at $x=n$ to obtain our approximation for the prime measure:
$$ s_T(n) = L_T(n) \tag{6} \label{eq:sT_def} $$The total error in approximating $\Lambda(n)$ by $s_T(n)$ is $E_{\text{total}}(n) = \Lambda(n) - s_T(n)$. This error arises from two fundamentally different sources, which we can separate by adding and subtracting the full analytic signal $L(n)$:
$$ E_{\text{total}}(n) = (\Lambda(n) - L(n)) + (L(n) - L_T(n)) $$We define these components as:
$$ E_{\text{total}}(n) = \underbrace{(\Lambda(n) - L(n))}_{E_{\text{mismatch}}(n)} + \underbrace{(L(n) - L_T(n))}_{E_{\text{trunc}}(n)} \tag{7} \label{eq:error_decomp} $$Here:
This term captures the contribution from the neglected high-frequency components:
$$ E_{\text{trunc}}(n) = L(n) - L_T(n) = -2 \sum_{\gamma > T} n^{-1/2} \cos(\gamma \ln n) \tag{8} \label{eq:Etrunc_def} $$This component embodies the "undersampling noise." Its properties are not those of simple white noise but are conjectured to be governed by the statistics of zeta zero spacings. These statistics are modeled by the Gaussian Unitary Ensemble (GUE) from Random Matrix Theory [4, 5].
Key statistical properties (under RH and GUE conjectures):
The GUE structure implies that the information lost by truncating at $T$ is patterned, reflecting deep correlations in the zeta spectrum.
This term represents the fundamental difference between the discrete reality of primes and the smooth analytic approximation, even if all zeros were known:
$$ E_{\text{mismatch}}(n) = \Lambda(n) - L(n) = \Lambda(n) - \left( 1 - 2 \sum_{\gamma > 0} n^{-1/2} \cos(\gamma \ln n) \right) \tag{9} \label{eq:Emismatch_def} $$Key properties:
where $C(\tau) = \lim_{X\to\infty} \frac{1}{X} \sum_{n\le X} \Lambda(n)\Lambda(n+\tau)$ is the conjectured average density of prime power pairs with difference $\tau$. $C(\tau)$ involves intricate products over primes related to the gap $\tau$ (e.g., $C(2)$ relates to twin primes).
This gap appears inherent to approximations based on the standard explicit formula and the resulting smooth density $L(n)$.
The intuition that a deep symmetry might "lock" the prime distribution finds mathematical grounding in the properties of the Riemann zeta function and its zeros.
While these symmetries profoundly shape the analytic landscape described by $L(x)$ and $E_{\text{trunc}}(n)$, they operate at the level of the density and its fluctuations. They do not eliminate the $E_{\text{mismatch}}$ gap (the density-vs-event divide) nor do they remove the dependence on knowing the specific locations of the zeros $\gamma$, which act as the fundamental frequencies in this framework.
Given that $s_T(n)$ cannot exactly equal $\Lambda(n)$ due to both error terms, we pivot to a probabilistic interpretation. We use the analytic approximation $s_T(n)$ and the statistical understanding of the truncation error $E_{\text{trunc}}(n)$ to estimate the likelihood that $n$ is a prime power.
Let $A_n$ be the event that $n$ is a prime power ($\Lambda(n) > 0$). We aim to model $P(A_n | s_T(n))$. The core idea is to integrate the likelihood of $A_n$ given the true signal value $L$, weighted by the probability density of $L$ given our computed approximation $s_T(n)$:
$$ P(A_n | s_T(n)) = \int_{-\infty}^{\infty} P(A_n | L) \, p(L | s_T(n)) \, dL \tag{11} \label{eq:prob_model_integral} $$Here:
This probabilistic model leverages the analytic structure ($s_T(n)$) and the statistical understanding of the truncation error ($\sigma^2(n, T)$) to provide a likelihood estimate for primality, effectively sidestepping the unbridgeable $E_{\text{mismatch}}$ gap by changing the goal from exact prediction to probabilistic estimation. However, it remains dependent on knowing zeta zeros up to height $T$.
The core findings of this exploration are summarized by the following interconnected set of equations, forming a unified descriptive model of the prime-zero relationship within this framework:
This set of equations provides a comprehensive description of how the explicit formula approximates the prime measure, the nature and origin of the errors involved, and a potential probabilistic application.
While this framework provides significant insight into the structure underlying prime distribution via the explicit formula, it does not yield a predictive prime formula free of deep dependencies and inherent limitations:
Bypassing the zero dependency would necessitate a fundamentally new analytic approach to prime numbers, potentially equivalent to resolving the Riemann Hypothesis or discovering an elementary prime-generating function---both monumental unsolved problems.
The exploration of "primes as an undersampling problem," when rigorously pursued through the lens of the explicit formula, yields a rich descriptive framework. The undersampling metaphor finds its mathematical counterpart in the truncation error term, $E_{\text{trunc}}(n)$, which represents structured noise arising from neglected high-frequency zeta terms. Its statistical properties are deeply connected to the conjectured GUE symmetries of the zeta spectrum.
However, the analysis also clearly delineates the fundamental mismatch error, $E_{\text{mismatch}}(n)$, inherent in approximating the discrete prime measure $\Lambda(n)$ with the smooth analytic density $L(n)$ derived from the zeta function. This gap remains a central challenge. The conjectured Riemann Hypothesis acts as a crucial symmetry constraint, locking the analytic form of fluctuations, but does not eliminate the need for specific zero locations.
The unified descriptive model presented herein (Section 6) encapsulates the relationship between $\Lambda(n)$, its analytic approximation $s_T(n)$, and the structured errors $E_{\text{trunc}}(n)$ and $E_{\text{mismatch}}(n)$. The probabilistic interpretation offers a pragmatic way to leverage this structure for estimating primality likelihood.
Ultimately, this framework provides a detailed map of the intricate prime-zero connection revealed by the explicit formula. It illuminates the power of analytic methods while clearly defining their limitations, particularly the inescapable dependence on the Riemann zeta zeros. It underscores that understanding the "noise"---the structured deviations from the average---is tantamount to understanding the primes themselves at this level of analysis. Further progress towards a predictive prime equation likely requires breakthroughs beyond this specific analytic paradigm. This work highlights the potential value of interdisciplinary perspectives, bridging concepts from signal processing and analytic number theory to gain new insights into classical problems.
Potential avenues for future work stemming from this framework include:
Content by Research Identifier: 7B7545EB2B5B22A28204066BD292A0365D4989260318CDF4A7A0407C272E9AFB
* This document synthesizes a detailed exploration of the core idea.
The term $f(n, T)$ arises from the off-diagonal terms ($i \neq j$) in the variance calculation of $E_{\text{trunc}}(n)$. It quantifies the effect of correlations between zeta zeros $\gamma_i, \gamma_j$ on the variance.
$$ f(n, T) = \frac{1}{N(T, \infty)} \sum_{\gamma_i > T} \sum_{\gamma_j > T, j \neq i} E[\cos((\gamma_i - \gamma_j) \ln n)] $$Using the density of zeros $\rho(\gamma) \approx \frac{1}{2\pi} \ln(\frac{\gamma}{2\pi})$ and the GUE pair correlation function $R_2(s) = 1 - (\sin(\pi s)/(\pi s))^2$, where $s = \delta \frac{\ln(\gamma/2\pi)}{2\pi}$ is the normalized spacing for a difference $\delta = \gamma_i - \gamma_j$ near height $\gamma$, we can approximate the sum as an integral:
$$ f(n, T) \approx \frac{1}{N(T, \infty)} \int_T^\infty \rho(\gamma) d\gamma \int_{-\infty}^\infty \rho(\gamma+\delta) R_2\left( \delta \frac{\ln(\gamma/2\pi)}{2\pi} \right) \cos(\delta \ln n) d\delta $$Changing the inner integration variable to the normalized spacing $s$, and approximating $\rho(\gamma+\delta) \approx \rho(\gamma)$ for the relevant range of $\delta$:
$$ f(n, T) \approx \frac{1}{N(T, \infty)} \int_T^\infty \rho(\gamma)^2 d\gamma \int_{-\infty}^\infty R_2(s) \cos\left( s \frac{2\pi \ln n}{\ln(\gamma/2\pi)} \right) \frac{2\pi}{\ln(\gamma/2\pi)} ds $$The inner integral is related to the Fourier transform of $R_2(s)$. The Fourier transform of $1 - R_2(s) = (\sin(\pi s)/(\pi s))^2$ is the triangular function $\frac{1}{2\pi} \max(0, 2\pi - |k|)$. Let $k = \frac{2\pi \ln n}{\ln(\gamma/2\pi)}$. The integral of $R_2(s) \cos(ks)$ is related to $1$ minus the Fourier transform of $1-R_2(s)$.
$$ \int_{-\infty}^\infty R_2(s) \cos(ks) ds = \int \cos(ks) ds - \int (\sin(\pi s)/(\pi s))^2 \cos(ks) ds = 2\pi \delta(k) - \max(0, 2\pi - |k|) $$Ignoring the Dirac delta term (as $k$ is generally non-zero), the inner integral becomes $-\max(0, 2\pi - |k|)$. Substituting this back:
$$ f(n, T) \approx -\frac{1}{N(T,\infty)} \int_T^\infty \rho(\gamma)^2 \frac{2\pi}{\ln(\gamma/2\pi)} \max\left(0, 1 - \frac{|\ln n|}{|\ln(\gamma/2\pi)|}\right) d\gamma $$This confirms $f(n, T)$ is typically negative (due to the leading minus sign and the max term being non-negative) and depends on $n, T$ and the GUE structure via $R_2$'s Fourier transform. The exact evaluation requires careful handling of the integrals and densities.