# Where Fourier Analysis Belongs (Draft)

13 m

## 1. Fourier Series; Heat Equation, Pt. 1

“Herein we have dealt with a single case only of a more general problem, which consists in developing any function whatever in an infinite series of sines or cosines of multiple arcs, “ wrote Joseph Fourier wrote in his 1882 memoir, The Analytical Theory of Heat. With a single sentence, Fourier ignited a centuries-long study of functions, infinite series, and integration, as well as providing a foundation for myriads of subjects: signal processing, cryptography, probability theory, differential equations — the list goes on.

In the original context of thermal physics, Fourier’s claim amounts to an arbitrary function $f(x)$ admitting an infinite-series representation

that converges to $f(x)$. For technical simplicity, the above infinite series is often written as

via Euler’s formula $e^{i\theta} = \cos \theta + i \sin \theta$.

Fourier was interested in deriving a general solution to the heat equation

which relates the distribution of heat to the passage of time. Let us consider a simple case, the steady-state heat equation

describing the state of equilibrium in which there is no more heat exchange. We consider the heat equation on the unit disk

where the distribution of the heat on the boundary satisfies an arbitrary function $f(\theta)$. To tackle the equation, we take the infinite-series representation

and propagate it into the disk as follows:

Computing the Laplacian

of the each term of the infinite series $u(r,\theta)$, we see that $\Delta u = 0$, as was to be shown.

Here, we have reduced the problem of finding a solution that satisfies an arbitrary boundary condition to that of finding a solution with a single frequency $c_n e^{in \theta}$. With this reduction, it is not difficult to see that $c_n e^{in\theta}r^{\vert n \vert}$ satisfies the steady-state heat equation and conforms to the boundary condition $c_n e^{in\theta}$. Since the heat equation is linear, we can piece together the solution for each frequency to obtain the solution to the steady-state heat equation with an arbitrary boundary condition.

The method of analysis presented above lends itself to a convenient yet far-reaching approach to tackling quantitative problems: decompose a complex problem into a collection of simpler problems, solve the simpler problems, and, from the solutions, extract insights about the original problem.

## 2. Fourier Transform; Heat Equation, Pt. 2

The infinite-series representation

is typically referred to as the Fourier series of $f$. How do we compute the Fourier coefficients $c_n$ corresponding to a function $f$? Observe that

Therefore, for each integer $m$,

For this reason, we define the function $\hat{f}:\mathbb{Z} \to \mathbb{C}$ by setting

so that

We have assumed implicitly that $f$ is $2\pi$-periodic, i.e., $f(x+2\pi) = f(x)$ for all $x \in \mathbb{R}$. To work with $L$-periodic functions, we can rescale the Fourier series as

where

A change of variables yields

from which we can obtain the limiting case of nonperiodic functions by sending $L \to \infty$:

The formula is now valid for non-integer parameters as well, and so we replace the integer parameter $n$ with a real parameter $\xi$:

The result is a function on the real line $\mathbb{R}$, called the Fourier transform of $f$. We note that $f$ can be recovered from $\hat{f}$ via Fourier inversion formula

With the Fourier transform, we can solve the time-dependent heat equation on the real line

with initial condition $u(x,0) = f(x)$. To this end, we need to know the Fourier transformation of the normalized Gaussian

Observe first that

We now note that the Fourier transform turns derivatives into multiplications by polynomials, i.e.,

where $\mathscr{F}(h)$ denotes the Fourier transform of $h$. Since $G'(x) = -2 \pi x G(x)$, we see that

Therefore,

and so $\hat{G}(\xi) e^{\pi \xi^2}$ is constant. Since

we conclude that $\hat{G}(\xi) = e^{-\pi \xi^2}$.

We now define the heat kernel

so that

Since the Fourier transform converts derivatives to multiplications by polynomials, taking the Fourier transform of the heat equation in the $x$ variable yields

Observe that, for each fixed $\xi$, the above identity is an ordinary differential equation in the $t$ variable, with initial condition $\hat{u}(\xi,0) = \hat{f}(\xi)$. It follows that

To finish off, we introduce the convolution of two functions $g$ and $h$:

For the task at hand, the relevant property of the convolution is that the Fourier transform turns convolution into pointwise multiplication:

This, in particular, implies that

whence it follows from the Fourier inversion formula that

is the solution of the time-dependent heat equation on the real line

with initial condition $u(x,0) = f(x)$.

## 3. Central Limit Theorem

But enough about the heat equation. What else have we got here?

Observe that the graph of the normalized Gaussian $G(x) = e^{-\pi x^2}$ is in the shape of the famous bell curve:

The conventional wisdom is that a reasonable distribution of events conforms to the bell curve. This is the folklore version of a foundational result in probability and statistics, the central limit theorem: the sum of a nice sequence of random variables converges to a Gaussian, when suitably normalized.

Let’s try to make sense of this result.

### 3.1. Terminology

Given a countable set $\Omega$, we define a probability measure on $\Omega$ to be a set function $\mathbb{P}:\mathcal{P}(\Omega) \to [0,1]$ on the power set $\mathcal{P}(\Omega)$ of $\Omega$ such that $\mathbb{P}[\varnothing]=0$, $\mathbb{P}(\Omega) = 1$, and that

whenever $\{E_n\}_{n=1}^\infty$ is a disjoint collection of subsets of $\Omega$. A subset of $\Omega$ is called an event, and the ordered triple $(\Omega, \mathcal{P}(\Omega), \mathbb{P})$ is called a probability space.

A random variable on a probability space $(\Omega, \mathcal{P}(\Omega), \mathbb{P})$ is a function $X: \Omega \to \mathbb{R}$. The expectation, or the mean, of a random variable $X$ is the sum

and the probability distribution of $X$ is the real function

Since $\mathbb{P}[\varnothing] = 0$ and $\mathbb{P}[\Omega] = 1$, we must have

Related to the probability distribution of $X$ is the probability mass function

which is especially useful when $X$ takes finitely many values.

A function $f_X$ that satisfies the identity

is called a probability density function of $X$. A density function, whenever it exists, is always unique. If a random variable $X$ admits a density function $f_X$, then we can write the mean of $X$ as follows:

Generalizing the above formula, we have that

for each $n \in \mathbb{N}$, provided that $X$ admits a density function. The quantity $\mathbb{E}[X^n]$ is called the $n$th moment of $X$. Of particular interest is the $n=2$ case: the variance of $X$ is defined to be the quantity

If $X$ admits a density function, then

### 3.2. Coin Tossing

An archetypal probabilistic scenario is coin tossing. If we are to toss a coin $n$ times, the result can be represented as an ordered $n$-tuple

where each $r_i$ is either $H$ (head) or $T$ (tail). In light of this, we model the tossing of a fair coin $n$ times by $(\{H,T\}^n, \mathcal{P}(\{H,T\}^n), \mathbb{P})$, where

for each $E \in \mathcal{P}(\{H,T\}^n)$.

The number of heads in a coin toss can be modeled by the random variable

where $\boldsymbol{1}_H(x_i)$ is 1 if $x_i = H$ and 0 otherwise. In this case, the probability mass function $p_X$ yields the number of possible coin-tossing outcomes with $k$ many heads, divided by $2^n$. It is not hard to see that

whence it follows from the binomial theorem that

whenever $k \in \{0,1,\ldots,n-1,n\}$.

More generally, if the probability of heads is $\theta$, then the above polynomial formulation yields

whenever $k \in \{0,1,\ldots,n-1,n\}.$ The distribution is given by the sum

We say that $X$ has a binomial distribution, and write $\operatorname{Bin}(k \mid n, \theta)$ to denote $p_X(k)$. If $n= 1$, then we say that $X$ has a Bernoulli distribution and write $\operatorname{Ber}(x \mid \theta)$ to denote $p_X(x)$.

We remark that an experiment consisting of repeatedly performing independent tasks with only two outcomes is called a Bernoulli trial. Our coin-tossing example is the archetypal example of a Bernoulli trial. Many other scenarios can be modeled as Bernoulli trials, so long as success and failure can be clearly defined.

### 3.3. De Moivre–Laplace Theorem

We now show that Bernoulli trials, with $\theta = 1/2$, conforms to the bell curve distribution when performed a large number of times.. To this end, we let $n = 2\nu$ and define

for each $-\nu \leq k \leq \nu$, so that the terms of the binomial distribution $\operatorname{Bin}(\cdot \mid n, \frac{1}{2} )$ are represented by the sequence

Recall Stirling’s formula

where $a_n \sim b_n$ means $a_n/b_n \to 1$ as $n \to \infty$. For each fixed $k \geq 0$, we have that

Since $a_k = a_{-k}$ for all $k$, the above formula characterizes all terms of $\operatorname{Bin}(\cdot \mid n, \frac{1}{2})$.

Now, Taylor’s theorem implies that

Therefore, for large enough $\nu$,

Since $e^{-1/2\nu} \to 1$ as $\nu \to \infty$, we conclude that

for large enough $\nu$. Setting $x = 2k / \sqrt{2\nu}$, we see that

for all sufficiently large $\nu$. It follows that

provided that $0 \leq k \leq \nu$ and $2k / \sqrt{2\nu} \to x$. By symmetry,

provided that $0 \leq k \leq \nu$ and $2k/\sqrt{2\nu} \to x$.

Let us leverage the above approximation result to furnish a limit theorem, viz., a concrete statement about convergence. For our purposes, It is convenient to work with random variables of mean zero. To this end, we consider the random variable

on the probability space $(\{H,T\}^n, \mathcal{P}(\{H,T\}^n), \mathbb{P})$ of $n$-fold coin tosses, where $\widetilde{x_i}$ is 1 if $x_i = H$ and -1 otherwise. The above result shows that

provided that $2k/\sqrt{2n} \to x$.

What can we say about the distribution of $Y_n$? Observe that

where the sum ranges over all even integers $m$ in $[a\sqrt{2n},b\sqrt{2n}]$. We let $x = m/\sqrt{2n}$ and invoke the above result to conclude that

where the sum ranges over all $x$ of the form $m/\sqrt{2n}$ in $[a,b]$. Since $m$ must be even, the distance between two adjacent $x$ is $\sqrt{\frac{2}{n}}$. In light of this, we can approximate, for sufficiently large $n$, the above sum by the following integral:

It follows that

whence

How do we extend the result to $Y_n$ with odd $n$? Given the same first $n$ tosses, $Y_n$ and $Y_{n+1}$ differ by 1, which is immaterial when $n$ is sufficiently large. We thus conclude that

which, in particular, implies that

This is the De Moivre–Laplace theorem.

### 3.4. Normal Distribution

The De Moivre–Laplace theorem implies that the coin toss model, normalized by $n^{-1/2}$, can be approximated by a random variable with a Gaussian density. More generally, let us examine probability distributions

where $\mathcal{G}(x) = ae^{-b(x-c)^2}$ is a Gaussian. Observe that

Since a probability distribution $F_X(\alpha)$ must tend to 1 as $\alpha \to \infty$, we see that $a$ must equal $\sqrt{b/\pi}$ in order for $\mathcal{G}(x)$ to be a density function.

We shall show that a random variable $X$ with a Gaussian density function

has mean $c$ and variance $\frac{1}{2b}$.

To compute the mean $\mathbb{E}[X]$, we take the coordinate transform $\sqrt{b}(\alpha-c) = u$:

Another coordinate transform, $u^2 = t$, yields

and so $\mathbb{E}[X] = c$.

As for the variance, we begin, once again, by taking the coordinate transform $\sqrt{b}(\alpha - c) = u$:

Now, we observe that

for all $v \in \mathbb{R}$, and so we can differentiate the integral with respect to $v$ to obtain the following:

It now follows that

whence

We have thus shown that a random variable with a Gaussian density function

has mean $c$ and variance $\frac{1}{2b}$. By defining the normal distribution with parameters $(\mu, \sigma^2)$ to be the integral

we obtain a probability distribution with mean $\mu$ and variance $\sigma^2$.

### 3.5. Independence

We have already observed, via the De Moivre–Laplace theorem, that Bernoulli trials in large numbers can be approximated by a normal distribution. What can we say about general random variables?

We note a crucial property of Bernoulli trials: each coin toss does not affect the outcome of any other coin tosses. In such cases, events $E_1,\ldots,E_n$ satisfy the identity

which we take to be the definition of the independence of events. A sequence of random variables $X_1,\ldots,X_n$ is said to be independent if the events

are independent for all choices of $\alpha_1,\ldots,\alpha_n \in \mathbb{R}$.

Of particular importance is sequences of independent and identically distributed (iid) random variables, which constitute natural generalization of the coin toss model. We say that a sequence $(X_n)_{n=1}^\infty$ of random variables is iid in case all of the terms have the same distribution and yet every finite subsequence is independent. We note that the terms of a iid sequence have the same moments.

To see that iid sequences are indeed a generalization of the coin toss model, we define $X_i$ to be the random variable that outputs the result of the $i$th coin toss. We show that $(X_i)_{i=1}^n$ is a sequence of iid random variables. Obviously, $X_1,\ldots,X_n$ share the distribution

Observe that

whenever $i \in \{1,2,\ldots,n\}$ and $\alpha \in \{0,1\}$, and so

for all $i,j \in \{1,2,\ldots,n\}$ and $\alpha_i,\alpha_j \in \{0,1\}$. Moreover, whenever $i_1,\ldots,i_k \in \{1,\ldots,n\}$ are distinct indices and $a_{i_1},\ldots,a_{i_k} \in \{0,1\}$,

and so $X_1,\ldots,X_n$ are independent.

### 3.6. Central Limit Theorem

Recall that the De Moivre–Laplace theorem is the limit theorem

where $Y_n$ is the $n$-fold coin toss random variable. We have introduced, in Section 3.5, the notion of independent, identically distributed random variables as a generalization of the coin toss model. Let us work out a generalized version of the above limit theorem.

Given that each random variable $X_n$ in an iid sequence $(X_n)_{n=1}^\infty$ is a generalization of one coin toss, the $n$-fold coin toss random variable $Y_n$ should be considered as the sum of the first $n$ coin toss random variables. Therefore, it makes sense to attempt to establish a limit theorem of the form

where $S_n = X_1 + \cdots + X_n$.

To tackle this problem, we introduce the Fourier transform

of a random variable $X$, called the characteristic function of $X$.

Tags:

Categories:

Updated: