I was reading “Data Analysis” by D. S. Sivia and found the following fairly early in.

Suppose you have a posterior probability density function \(P(x\mid y)\). One way to approximate it is with a Normal distribution or by writing it with two parameters in the form \(P(x\mid y) \approx \bar{x} \pm \sigma\), where \(\bar{x}\) is the best estimate for \(P(x)\) and \(\sigma\) is the standard deviation.

It is clear that the maximum of the posterior is given by \(\frac{d P}{d x} \mid_{\bar{x}} = 0\)$ (and \(\frac{d^2 P}{d x^2} \vert_{\bar{x}} < 0)\).

A measure of the reliability of this best estimate can be obtained by computing the Taylor expansion of the log-likelihood, \(L = \log P(x\mid y)\):

$$L = L(\bar{x}) + \frac{1}{2} \frac{d^2 L}{d x^2} \vert_{\bar{x}} (x-\bar{x})^2 + \ldots$$

where the second term is missing because \(\frac{d L}{d x} \vert_{\bar{x}} = 0\) since \(L\) is a monotonic function of \(P\).

Now, the \(x^2\) term dominates the Taylor series, and after rearranging we get:

$$P(x\mid y) \approx A \exp \left[\frac{1}{2}\frac{d^2 L}{d x^2} \vert_{\bar{x}} (x-\bar{x})^2\right] = \frac{1}{\sigma\sqrt{2\pi}} \exp\left[-\frac{(x-\bar{x})^2}{2\sigma^2}\right]$$

We have obtained the Normal, or Gaussian, distribution. Note that \(\sigma = \left(-\frac{d^2 L}{d x^2} \vert_{\bar{x}}\right)^{-\frac{1}{2}}\)