I was reading "Data Analysis" by D. S. Sivia and found the following fairly early in.
Suppose you have a posterior probability density function . One way to approximate it is with a Normal distribution or by writing it with two parameters in the form
, where
is the best estimate for
and
is the standard deviation.
It is clear that the maximum of the posterior is given by (and
.
A measure of the reliability of this best estimate can be obtained by computing the Taylor expansion of the log-likelihood, :
where the second term is missing because since
is a monotonic function of
.
Now, the term dominates the Taylor series, and after rearranging we get:
.
We have obtained the Normal, or Gaussian, distribution. Note that