Bayesian Inference

Initialisation

The dataset is ready to undergo Maximum a Posteriori Estimation (MAP). It is now time to build the Bayesian infrastructure. To this end, we invoke the constructor of bfade.elhaddad.ElHaddadBayes:

bay = ElHaddadBayes("dk_th", "ds_w", Y=aFloat, name=dat.name) # The default name is 'Untitled'

where name can be any string, though we load name = dat.name (the name of the dataset) for later referencing. Using a consistent naming shall help organise exported files and figure. Please note that dk_th and ds_w are the probabilistic trainable parameters, whereas Y is regarded as deterministic. Specifically, we infer the vector \(\theta=\left[\Delta K_{th, lc}\quad \Delta\sigma_w\right]\), using the input feature vector \(\mathbf{x} = \left[\sqrt{\text{area}} \quad \Delta\sigma \right]\).

bfade.abstract.AbstractBayes comes with an intuitive interface to progressively build the elements of Bayes’ theorem, which bfade.elhaddad.ElHaddadBayes inherits.

Prior

We initially define the prior over each parameter. B-FADE enables the prior to be instantiated from any scipy.stats’s probability distribution. Alongside this, a custom Uniform distribution is provided in bfade.statistics.uniform (the motivation is highlighted in the function’s documentation). In this example, we hypothesise a Gaussian and a Uniform distribution for \(\Delta K_{th,lc}\) and \(\Delta\sigma_w\), respectively:

\[\Delta K_{th,lc} \sim \mathcal{N(\mu, \sigma)}\]

\[\Delta\sigma_w \sim \mathcal{U}\]

Please note that the standard deviation \(\sigma\) was used in the definition of the Gaussian prior, instead of the variance \(\sigma^2\). This reflects the implementation.

\(\Delta K_{th,lc}\) is initialised by:

# provide a list of values
dk_th_list = aList

mean = np.array(dk_th_list).mean()
# mean  = aNumber is allowed as well
std = np.array(dk_th_list).std()
# std  = aNumber is allowed as well

# initialisation
from scipy.stats import norm
bay.load_prior("dk_th", norm, loc=mean, scale = std)

\(\Delta \sigma_w\) is initialised by:

from bfade.statistics import uniform
bay.load_prior("ds_w", uniform, unif_value = 1)

The log-prior is automatically (so, no invocation is needed) assembled by bfade.abstract.AbstractBayes.log_prior as the logarithm of the following expression:

\[P[\theta] = P[\Delta K_{th,lc}]\, P[\Delta\sigma_w]\]

that is, the priors are composed as independent distributions.

Likelihood

Since B-FADE handles a binary classification problem, users are expected to compute the logarithm of Bernoulli likelihood. The likelihood is given by:

\[P[D|\theta] = \prod_{i=1}^{\mathsf{N}} P[\mathbf{x}_i | \theta]^{y_i}\,(1- P[\mathbf{x}_i | \theta])^{1-y_i}\]

where \(y_i\) and \(P[\mathbf{x}_i | \theta]\) and the ground-truth, and predicted label of the data, respectively (0 for runout samples, 1 for failed samples). The predicted values are computed via bfade.ElHaddadBayes.predictor (concrete method of bfade.AbstractBayes.predictor). This function, performs the logistic regression:

\[P[\mathbf{x}_i | \theta] = \frac{1}{1 + \exp[-\mathcal{H}(\mathbf{x}_i, \theta)]}\]

where \(\mathcal{H}(\mathbf{x}_i, \theta)\) is the signed distance between the sample \(\mathbf{x}_i\) and the El Haddad curve with parameters \(\theta\). The signed distance is implemented in bfade.elhaddad.ElHaddadCurve:signed_distance_to_dataset and wrapped into bfade.AbstractBayes.predictor.

The set up the log-likelihood we only need to call the method:

from sklearn.metrics import log_loss
bay.load_log_likelihood(log_loss, normalize=aBool)

In this respect, we must pay attention to the keyword normalize. Precisely, if we use uniform priors on both parameters (default when invoking ElHaddadBayes) we are actually performing Maximum Likelihood Estimation, and the log-likelihood need not to be ‘normalised’:

bay.load_log_likelihood(log_loss, normalize=False)

By contrast, if we use other priors, we are regularising the likelihood. Therefore, this requires:

bay.load_log_likelihood(log_loss, normalize=True)

Now bay can compute the log-likelihood of the dataset, i.e. \(\log P[D|\theta]\).

Before proceeding, we ought to spend a few words on computing the signed distance that the log-likelihood entails. bfade.ElHaddadBayes.predictor computes the signed distance by finding the mimimum-distance point of each datum of \(D\) over the log-log plane. The method, in fact, invokes:

eh = ElHaddadCurve(metrics=np.log10, **all_pars)

using np.log10 as metrics. Then, the distances between the given and the mimimum-distance points are computed using the common Euclidean metrics over the lin-lin plane. If, for a certain reason, the user needs to compute the minimum-distance points over the lin-lin plane, just invoke the curve with metrics=identity, where identity is bfade.util.identity.

Posterior

The last element of Bayes’ theorem is the posterior:

\[P[\theta | D] = \frac{P[D | \theta]\, P[\theta]}{P[D]}\]

It is well-known that MAP involves maximising the log-posterior:

\[\log P[\theta | D] = \log P[D | \theta] + \log P[\theta]\]

neglecting the log-evidence \(-\log P[D]\), as it is a constant. In this case, nothing else has to be coded. The method bfade.elhaddad.ElHaddadBayes.log_posterior takes care of arranging the log-posterior.

Maximum a Posteriori Estimation

The last step entails executing MAP:

\[\hat{\theta} = -\underset{\theta}{\text{argmin}}\ \log P[\theta | D]\]

B-FADE exploit the monotonicity of the logarithm to perform the computations, thus allowing the evidence’s term (\(-P[D]\), constant) to be neglected in the optimisation.

MAP runs by:

bay.MAP(dat, x0=[aFloadDkGuess, aFloadDsGuess])

where aFloadDkGuess and aFloadDsGuess are the initial guesses for the optimiser.

If MAP succeedes the optimal values of the parameters \(\hat{\theta} = \begin{bmatrix} \Delta K_{th,lc}\quad \Delta\sigma_w \end{bmatrix}\) are stored in bay.theta_hat. Furthermore, the next stage required the inverse Hessian matrix computed at theta_hat. If MAP succeedes, such matrix is stored in bay.ihess.

Posterior Approximation

Once MAP succeedes the posterior distribution is approximated using Variational Inference. In this case the approximator is a multivariate Gaussian distribution (i.e. Laplace’s approximation) such that:

\[P[\theta | D ] = \mathcal{N}(\hat{\theta}, H^{-1})\]

where \(H^{-1}\) is the inverse Hessian matrix of \(-\log P[\theta | D]\) evaluated at \(\hat{\theta}\). This is the joint posterior distribution. The approximation automatically loads in case of MAP succeedes, which not only loads the joint posterior distribution (the last equation), but also extracts the marginal posterior distributions:

\[\Delta K_{th,lc} \sim \mathcal{N}(\hat{\Delta K_{th,lc}}, H^{-1}_{11})\]

\[\Delta\sigma_w \sim \mathcal{N}(\hat{\Delta\sigma_w}, H^{-1}_{22})\]