I was reading "Data Analysis" by D. S. Sivia and found the following fairly early in.
Suppose you have a posterior probability density function . One way to approximate it is with a Normal distribution or by writing it with two parameters in the form , where is the best estimate for and is the standard deviation.
It is clear that the maximum of the posterior is given by (and .
A measure of the reliability of this best estimate can be obtained by computing the Taylor expansion of the log-likelihood, :
where the second term is missing because since is a monotonic function of .
Now, the term dominates the Taylor series, and after rearranging we get:
We have obtained the Normal, or Gaussian, distribution. Note that