# Where Fourier Analysis Belongs (Draft)

## 1. Fourier Series; Heat Equation, Pt. 1

“Herein we have dealt with a single case only of a more general problem, which consists in developing any function whatever in an infinite series of sines or cosines of multiple arcs, “ wrote Joseph Fourier wrote in his 1882 memoir, *The Analytical Theory of Heat*. With a single sentence, Fourier ignited a centuries-long study of functions, infinite series, and integration, as well as providing a foundation for myriads of subjects: signal processing, cryptography, probability theory, differential equations — the list goes on.

In the original context of thermal physics, Fourier’s claim amounts to an arbitrary function admitting an infinite-series representation

that converges to . For technical simplicity, the above infinite series is often written as

via Euler’s formula .

Fourier was interested in deriving a general solution to the heat equation

which relates the distribution of heat to the passage of time. Let us consider a simple case, the **steady-state heat equation**

describing the state of equilibrium in which there is no more heat exchange. We consider the heat equation on the unit disk

where the distribution of the heat on the boundary satisfies an arbitrary function . To tackle the equation, we take the infinite-series representation

and propagate it into the disk as follows:

Computing the **Laplacian**

of the each term of the infinite series , we see that , as was to be shown.

Here, we have reduced the problem of finding a solution that satisfies an arbitrary boundary condition to that of finding a solution with a single frequency . With this reduction, it is not difficult to see that satisfies the steady-state heat equation and conforms to the boundary condition . Since the heat equation is linear, we can piece together the solution for each frequency to obtain the solution to the steady-state heat equation with an arbitrary boundary condition.

The method of analysis presented above lends itself to a convenient yet far-reaching approach to tackling quantitative problems: decompose a complex problem into a collection of simpler problems, solve the simpler problems, and, from the solutions, extract insights about the original problem.

## 2. Fourier Transform; Heat Equation, Pt. 2

The infinite-series representation

is typically referred to as the **Fourier series** of . How do we compute the **Fourier coefficients** corresponding to a function ? Observe that

Therefore, for each integer ,

For this reason, we define the function by setting

so that

We have assumed implicitly that is -periodic, i.e., for all . To work with -periodic functions, we can rescale the Fourier series as

where

A change of variables yields

from which we can obtain the limiting case of nonperiodic functions by sending :

The formula is now valid for non-integer parameters as well, and so we replace the integer parameter with a real parameter :

The result is a function on the real line , called the **Fourier transform** of . We note that can be recovered from via **Fourier inversion formula**

With the Fourier transform, we can solve the time-dependent heat equation on the real line

with initial condition . To this end, we need to know the Fourier transformation of the normalized **Gaussian**

Observe first that

We now note that the Fourier transform turns derivatives into multiplications by polynomials, i.e.,

where denotes the Fourier transform of . Since , we see that

Therefore,

and so is constant. Since

we conclude that .

We now define the **heat kernel**

so that

Since the Fourier transform converts derivatives to multiplications by polynomials, taking the Fourier transform of the heat equation in the variable yields

Observe that, for each fixed , the above identity is an ordinary differential equation in the variable, with initial condition . It follows that

To finish off, we introduce the **convolution** of two functions and :

For the task at hand, the relevant property of the convolution is that the Fourier transform turns convolution into pointwise multiplication:

This, in particular, implies that

whence it follows from the Fourier inversion formula that

is the solution of the time-dependent heat equation on the real line

with initial condition .

## 3. Central Limit Theorem

But enough about the heat equation. What else have we got here?

Observe that the graph of the normalized Gaussian is in the shape of the famous *bell curve*:

The conventional wisdom is that a reasonable distribution of events conforms to the bell curve. This is the folklore version of a foundational result in probability and statistics, the **central limit theorem**: the sum of a *nice* sequence of random variables *converges* to a Gaussian, when suitably normalized.

Let’s try to make sense of this result.

### 3.1. Terminology

Given a countable set , we define a **probability measure** on to be a set function on the power set of such that , , and that

whenever is a disjoint collection of subsets of . A subset of is called an **event**, and the ordered triple is called a **probability space**.

A **random variable** on a probability space is a function . The **expectation**, or the **mean**, of a random variable is the sum

and the **probability distribution** of is the real function

Since and , we must have

Related to the probability distribution of is the **probability mass function**

which is especially useful when takes finitely many values.

A function that satisfies the identity

is called a **probability density function** of . A density function, whenever it exists, is always unique. If a random variable admits a density function , then we can write the mean of as follows:

Generalizing the above formula, we have that

for each , provided that admits a density function. The quantity is called the **th moment of **. Of particular interest is the case: the **variance** of is defined to be the quantity

If admits a density function, then

### 3.2. Coin Tossing

An archetypal probabilistic scenario is coin tossing. If we are to toss a coin times, the result can be represented as an ordered -tuple

where each is either (head) or (tail). In light of this, we model the tossing of a fair coin times by , where

for each .

The number of heads in a coin toss can be modeled by the random variable

where is 1 if and 0 otherwise. In this case, the probability mass function yields the number of possible coin-tossing outcomes with many heads, divided by . It is not hard to see that

whence it follows from the binomial theorem that

whenever .

More generally, if the probability of heads is , then the above polynomial formulation yields

whenever The distribution is given by the sum

We say that has a **binomial distribution**, and write to denote . If , then we say that has a **Bernoulli distribution** and write to denote .

We remark that an experiment consisting of repeatedly performing *independent* tasks with only two outcomes is called a **Bernoulli trial**. Our coin-tossing example is the archetypal example of a Bernoulli trial. Many other scenarios can be modeled as Bernoulli trials, so long as *success* and *failure* can be clearly defined.

### 3.3. De Moivre–Laplace Theorem

We now show that Bernoulli trials, with , conforms to the bell curve distribution when performed a large number of times.. To this end, we let and define

for each , so that the terms of the binomial distribution are represented by the sequence

Recall **Stirling’s formula**

where means as . For each fixed , we have that

Since for all , the above formula characterizes all terms of .

Now, Taylor’s theorem implies that

Therefore, for large enough ,

Since as , we conclude that

for large enough . Setting , we see that

for all sufficiently large . It follows that

provided that and . By symmetry,

provided that and .

Let us leverage the above approximation result to furnish a **limit theorem**, viz., a concrete statement about convergence. For our purposes, It is convenient to work with random variables of mean zero. To this end, we consider the random variable

on the probability space of -fold coin tosses, where is 1 if and -1 otherwise. The above result shows that

provided that .

What can we say about the distribution of ? Observe that

where the sum ranges over all even integers in . We let and invoke the above result to conclude that

where the sum ranges over all of the form in . Since must be even, the distance between two adjacent is . In light of this, we can approximate, for sufficiently large , the above sum by the following integral:

It follows that

whence

How do we extend the result to with odd ? Given the same first tosses, and differ by 1, which is immaterial when is sufficiently large. We thus conclude that

which, in particular, implies that

This is the **De Moivre–Laplace theorem**.

### 3.4. Normal Distribution

The De Moivre–Laplace theorem implies that the coin toss model, normalized by , can be approximated by a random variable with a Gaussian density. More generally, let us examine probability distributions

where is a Gaussian. Observe that

Since a probability distribution must tend to 1 as , we see that must equal in order for to be a density function.

We shall show that a random variable with a Gaussian density function

has mean and variance .

To compute the mean , we take the coordinate transform :

Another coordinate transform, , yields

and so .

As for the variance, we begin, once again, by taking the coordinate transform :

Now, we observe that

for all , and so we can differentiate the integral with respect to to obtain the following:

It now follows that

whence

We have thus shown that a random variable with a Gaussian density function

has mean and variance . By defining the **normal distribution** with parameters to be the integral

we obtain a probability distribution with mean and variance .

### 3.5. Independence

We have already observed, via the **De Moivre–Laplace theorem**, that Bernoulli trials in large numbers can be approximated by a normal distribution. What can we say about general random variables?

We note a crucial property of Bernoulli trials: each coin toss does not affect the outcome of any other coin tosses. In such cases, events satisfy the identity

which we take to be the definition of the **independence** of events. A sequence of random variables is said to be **independent** if the events

are independent for all choices of .

Of particular importance is sequences of **independent and identically distributed** (**iid**) random variables, which constitute natural generalization of the coin toss model. We say that a sequence of random variables is **iid** in case all of the terms have the same distribution and yet every finite subsequence is independent. We note that the terms of a iid sequence have the same moments.

To see that iid sequences are indeed a generalization of the coin toss model, we define to be the random variable that outputs the result of the th coin toss. We show that is a sequence of iid random variables. Obviously, share the distribution

Observe that

whenever and , and so

for all and . Moreover, whenever are distinct indices and ,

and so are independent.

### 3.6. Central Limit Theorem

Recall that the **De Moivre–Laplace theorem** is the limit theorem

where is the -fold coin toss random variable. We have introduced, in Section 3.5, the notion of independent, identically distributed random variables as a generalization of the coin toss model. Let us work out a generalized version of the above limit theorem.

Given that each random variable in an iid sequence is a generalization of *one* coin toss, the -fold coin toss random variable should be considered as the *sum* of the first coin toss random variables. Therefore, it makes sense to attempt to establish a limit theorem of the form

where .

To tackle this problem, we introduce the Fourier transform

of a random variable , called the **characteristic function** of .