1. Fourier Series; Heat Equation, Pt. 1
“Herein we have dealt with a single case only of a more general problem, which consists in developing any function whatever in an infinite series of sines or cosines of multiple arcs, “ wrote Joseph Fourier wrote in his 1882 memoir, The Analytical Theory of Heat. With a single sentence, Fourier ignited a centuries-long study of functions, infinite series, and integration, as well as providing a foundation for myriads of subjects: signal processing, cryptography, probability theory, differential equations — the list goes on.
In the original context of thermal physics, Fourier’s claim amounts to an arbitrary function admitting an infinite-series representation
that converges to . For technical simplicity, the above infinite series is often written as
via Euler’s formula .
Fourier was interested in deriving a general solution to the heat equation
which relates the distribution of heat to the passage of time. Let us consider a simple case, the steady-state heat equation
describing the state of equilibrium in which there is no more heat exchange. We consider the heat equation on the unit disk
where the distribution of the heat on the boundary satisfies an arbitrary function . To tackle the equation, we take the infinite-series representation
and propagate it into the disk as follows:
Computing the Laplacian
of the each term of the infinite series , we see that , as was to be shown.
Here, we have reduced the problem of finding a solution that satisfies an arbitrary boundary condition to that of finding a solution with a single frequency . With this reduction, it is not difficult to see that satisfies the steady-state heat equation and conforms to the boundary condition . Since the heat equation is linear, we can piece together the solution for each frequency to obtain the solution to the steady-state heat equation with an arbitrary boundary condition.
The method of analysis presented above lends itself to a convenient yet far-reaching approach to tackling quantitative problems: decompose a complex problem into a collection of simpler problems, solve the simpler problems, and, from the solutions, extract insights about the original problem.
2. Fourier Transform; Heat Equation, Pt. 2
The infinite-series representation
is typically referred to as the Fourier series of . How do we compute the Fourier coefficients corresponding to a function ? Observe that
Therefore, for each integer ,
For this reason, we define the function by setting
We have assumed implicitly that is -periodic, i.e., for all . To work with -periodic functions, we can rescale the Fourier series as
A change of variables yields
from which we can obtain the limiting case of nonperiodic functions by sending :
The formula is now valid for non-integer parameters as well, and so we replace the integer parameter with a real parameter :
The result is a function on the real line , called the Fourier transform of . We note that can be recovered from via Fourier inversion formula
With the Fourier transform, we can solve the time-dependent heat equation on the real line
with initial condition . To this end, we need to know the Fourier transformation of the normalized Gaussian
Observe first that
We now note that the Fourier transform turns derivatives into multiplications by polynomials, i.e.,
where denotes the Fourier transform of . Since , we see that
and so is constant. Since
we conclude that .
We now define the heat kernel
Since the Fourier transform converts derivatives to multiplications by polynomials, taking the Fourier transform of the heat equation in the variable yields
Observe that, for each fixed , the above identity is an ordinary differential equation in the variable, with initial condition . It follows that
To finish off, we introduce the convolution of two functions and :
For the task at hand, the relevant property of the convolution is that the Fourier transform turns convolution into pointwise multiplication:
This, in particular, implies that
whence it follows from the Fourier inversion formula that
is the solution of the time-dependent heat equation on the real line
with initial condition .
3. Central Limit Theorem
But enough about the heat equation. What else have we got here?
Observe that the graph of the normalized Gaussian is in the shape of the famous bell curve:
The conventional wisdom is that a reasonable distribution of events conforms to the bell curve. This is the folklore version of a foundational result in probability and statistics, the central limit theorem: the sum of a nice sequence of random variables converges to a Gaussian, when suitably normalized.
Let’s try to make sense of this result.
Given a countable set , we define a probability measure on to be a set function on the power set of such that , , and that
whenever is a disjoint collection of subsets of . A subset of is called an event, and the ordered triple is called a probability space.
A random variable on a probability space is a function . The expectation, or the mean, of a random variable is the sum
and the probability distribution of is the real function
Since and , we must have
Related to the probability distribution of is the probability mass function
which is especially useful when takes finitely many values.
A function that satisfies the identity
is called a probability density function of . A density function, whenever it exists, is always unique. If a random variable admits a density function , then we can write the mean of as follows:
Generalizing the above formula, we have that
for each , provided that admits a density function. The quantity is called the th moment of . Of particular interest is the case: the variance of is defined to be the quantity
If admits a density function, then
3.2. Coin Tossing
An archetypal probabilistic scenario is coin tossing. If we are to toss a coin times, the result can be represented as an ordered -tuple
where each is either (head) or (tail). In light of this, we model the tossing of a fair coin times by , where
for each .
The number of heads in a coin toss can be modeled by the random variable
where is 1 if and 0 otherwise. In this case, the probability mass function yields the number of possible coin-tossing outcomes with many heads, divided by . It is not hard to see that
whence it follows from the binomial theorem that
More generally, if the probability of heads is , then the above polynomial formulation yields
whenever The distribution is given by the sum
We say that has a binomial distribution, and write to denote . If , then we say that has a Bernoulli distribution and write to denote .
We remark that an experiment consisting of repeatedly performing independent tasks with only two outcomes is called a Bernoulli trial. Our coin-tossing example is the archetypal example of a Bernoulli trial. Many other scenarios can be modeled as Bernoulli trials, so long as success and failure can be clearly defined.
3.3. De Moivre–Laplace Theorem
We now show that Bernoulli trials, with , conforms to the bell curve distribution when performed a large number of times.. To this end, we let and define
for each , so that the terms of the binomial distribution are represented by the sequence
Recall Stirling’s formula
where means as . For each fixed , we have that
Since for all , the above formula characterizes all terms of .
Now, Taylor’s theorem implies that
Therefore, for large enough ,
Since as , we conclude that
for large enough . Setting , we see that
for all sufficiently large . It follows that
provided that and . By symmetry,
provided that and .
Let us leverage the above approximation result to furnish a limit theorem, viz., a concrete statement about convergence. For our purposes, It is convenient to work with random variables of mean zero. To this end, we consider the random variable
on the probability space of -fold coin tosses, where is 1 if and -1 otherwise. The above result shows that
provided that .
What can we say about the distribution of ? Observe that
where the sum ranges over all even integers in . We let and invoke the above result to conclude that
where the sum ranges over all of the form in . Since must be even, the distance between two adjacent is . In light of this, we can approximate, for sufficiently large , the above sum by the following integral:
It follows that
How do we extend the result to with odd ? Given the same first tosses, and differ by 1, which is immaterial when is sufficiently large. We thus conclude that
which, in particular, implies that
This is the De Moivre–Laplace theorem.
3.4. Normal Distribution
The De Moivre–Laplace theorem implies that the coin toss model, normalized by , can be approximated by a random variable with a Gaussian density. More generally, let us examine probability distributions
where is a Gaussian. Observe that
Since a probability distribution must tend to 1 as , we see that must equal in order for to be a density function.
We shall show that a random variable with a Gaussian density function
has mean and variance .
To compute the mean , we take the coordinate transform :
Another coordinate transform, , yields
and so .
As for the variance, we begin, once again, by taking the coordinate transform :
Now, we observe that
for all , and so we can differentiate the integral with respect to to obtain the following:
It now follows that
We have thus shown that a random variable with a Gaussian density function
has mean and variance . By defining the normal distribution with parameters to be the integral
we obtain a probability distribution with mean and variance .
We have already observed, via the De Moivre–Laplace theorem, that Bernoulli trials in large numbers can be approximated by a normal distribution. What can we say about general random variables?
We note a crucial property of Bernoulli trials: each coin toss does not affect the outcome of any other coin tosses. In such cases, events satisfy the identity
which we take to be the definition of the independence of events. A sequence of random variables is said to be independent if the events
are independent for all choices of .
Of particular importance is sequences of independent and identically distributed (iid) random variables, which constitute natural generalization of the coin toss model. We say that a sequence of random variables is iid in case all of the terms have the same distribution and yet every finite subsequence is independent. We note that the terms of a iid sequence have the same moments.
To see that iid sequences are indeed a generalization of the coin toss model, we define to be the random variable that outputs the result of the th coin toss. We show that is a sequence of iid random variables. Obviously, share the distribution
whenever and , and so
for all and . Moreover, whenever are distinct indices and ,
and so are independent.
3.6. Central Limit Theorem
Recall that the De Moivre–Laplace theorem is the limit theorem
where is the -fold coin toss random variable. We have introduced, in Section 3.5, the notion of independent, identically distributed random variables as a generalization of the coin toss model. Let us work out a generalized version of the above limit theorem.
Given that each random variable in an iid sequence is a generalization of one coin toss, the -fold coin toss random variable should be considered as the sum of the first coin toss random variables. Therefore, it makes sense to attempt to establish a limit theorem of the form
To tackle this problem, we introduce the Fourier transform
of a random variable , called the characteristic function of .