Demo¶

This notebook demonstrates the basic functionality of the perfectns package; for background see the dynamic nested sampling paper (Higson at al., 2019a).

Running nested sampling calculations¶

The likelihood $\mathcal{L}(\theta)$, prior $\pi(\theta)$ and calculation settings are specified in a PerfectNSSettings object. For this example we will use a 10-dimensional spherically symmetric Gaussian likelihood with size $\sigma_\mathcal{L}=1$ and a Gaussian prior with size $\sigma_{\pi}=10$.

[1]:

import perfectns.settings
import perfectns.likelihoods as likelihoods
import perfectns.priors as priors

# Input settings
settings = perfectns.settings.PerfectNSSettings()
settings.likelihood = likelihoods.Gaussian(likelihood_scale=1)
settings.prior = priors.Gaussian(prior_scale=10)
settings.n_dim = 10
settings.ninit = 10
settings.nlive_const = 100

The “dynamic_goal” setting determines if dynamic nested sampling should be used and, if so, how to split the computational effort between increasing parameter estimation accuracy and evidence calculation accuracy. dynamic_goal=1 optimises purely for parameter estimation and dynamic_goal=0 optimises purely for calculating the Bayesian evidence $\mathcal{Z}$.

Lets try running standard nested sampling and dynamic nested sampling calculation:

[2]:

import perfectns.nested_sampling as nested_sampling

# Perform standard nested sampling
settings.dynamic_goal = None
standard_ns_run = nested_sampling.generate_ns_run(settings, random_seed=0)  # set random_seed for reproducible results
# Perform dynamic nested sampling
settings.dynamic_goal = 1  # optimise for parameter estimation accuracy
dynamic_ns_run = nested_sampling.generate_ns_run(settings, random_seed=0)  # set random_seed for reproducible results

We can now make posterior inferences using the samples generated by the nested sampling calculations using the utility functions from nestcheck. Here we calculate:

the log Bayesian evidence $\log \mathcal{Z}=\log \left( \int \mathcal{L}(\theta) \pi(\theta) \mathrm{d}\theta \right)$,
the mean of the first parameter $\theta_1$,
the second moment of the posterior distribution of $\theta_1$,
the median of $\theta_1$,
the 84% one-tailed credible interval on $\theta_1$.

For the Gaussian likelihood and prior we can calculate the posterior distribution analytically, so we first calculate the analytic values of each quantity for comparison. The results are displayed in a pandas DataFrame.

[3]:

import perfectns.estimators as e
import nestcheck.ns_run_utils
import pandas as pd

estimator_list = [e.LogZ(),
                  e.ParamMean(),
                  e.ParamSquaredMean(),
                  e.ParamCred(0.5),
                  e.ParamCred(0.84)]
estimator_names = [est.latex_name for est in estimator_list]
results = pd.DataFrame([nestcheck.ns_run_utils.run_estimators(standard_ns_run, estimator_list),
                        nestcheck.ns_run_utils.run_estimators(dynamic_ns_run, estimator_list)],
                       columns=estimator_names, index=['standard run', 'dynamic run'])
# Add true values for comparison
results.loc['true values'] = e.get_true_estimator_values(estimator_list, settings)
results

[3]:

	$\mathrm{log} \mathcal{Z}$	$\overline{\theta_{\hat{1}}}$	$\overline{\theta^2_{\hat{1}}}$	$\mathrm{median}(\theta_{\hat{1}})$	$\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
standard run	-32.289475	-0.029820	0.971677	-0.046113	0.945738
dynamic run	-33.053897	-0.054664	0.931194	-0.093203	0.913430
true values	-32.264988	0.000000	0.990080	0.000000	0.989523

Estimating sampling errors¶

You can estimate the numerical uncertainties on these results by calculating the standard deviation of the sampling errors distributions each run using the bootstrap resampling approach described in Higson et al. (2018) (implemented in nestcheck).

[5]:

import numpy as np
import nestcheck.error_analysis
np.random.seed(0)
results.loc['standard unc'] = nestcheck.error_analysis.run_std_bootstrap(
    standard_ns_run, estimator_list, n_simulate=200)
results.loc['dynamic unc'] = nestcheck.error_analysis.run_std_bootstrap(
    dynamic_ns_run, estimator_list, n_simulate=200)
results.loc[['standard unc', 'dynamic unc']]

[5]:

	$\mathrm{log} \mathcal{Z}$	$\overline{\theta_{\hat{1}}}$	$\overline{\theta^2_{\hat{1}}}$	$\mathrm{median}(\theta_{\hat{1}})$	$\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
standard unc	0.366472	0.031321	0.049564	0.043201	0.050531
dynamic unc	1.204710	0.017033	0.030143	0.020610	0.027900

This approach works for both dynamic and standard nested sampling. In addition as perfectns can perform the nested sampling algorithm “perfectly” there are no additional errors from implementation-specific effects such as correlated samples (see Higson et al., 2019b for a detailed discussion).

Generating and analysing runs in parallel¶

Multiple nested sampling runs can be generated and analysed in parallel (using parallel_utils from nestcheck).

[6]:

import numpy as np
import nestcheck.parallel_utils as pu
import nestcheck.pandas_functions as pf

# Generate 100 nested sampling runs
run_list = nested_sampling.get_run_data(settings, 100, save=False, load=False, random_seeds=list(range(100)))
# Calculate posterior inferences for each run
values = pu.parallel_apply(nestcheck.ns_run_utils.run_estimators, run_list,
                           func_args=(estimator_list,))
# Show the mean and standard deviation of the calculation results
multi_run_tests = pf.summary_df_from_list(values, estimator_names)
multi_run_tests

[6]:

		$\mathrm{log} \mathcal{Z}$	$\overline{\theta_{\hat{1}}}$	$\overline{\theta^2_{\hat{1}}}$	$\mathrm{median}(\theta_{\hat{1}})$	$\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
calculation type	result type
mean	value	-32.139762	-0.000028	0.992171	-0.001141	0.992247
mean	uncertainty	0.114168	0.001896	0.002937	0.002478	0.002826
std	value	1.141684	0.018956	0.029367	0.024777	0.028265
std	uncertainty	0.081136	0.001347	0.002087	0.001761	0.002009

Comparing dynamic and standard nested sampling performance¶

Lets now compare the performance of dynamic and standard nested sampling, using the 10-dimensional Gaussian likelihood and prior.

[7]:

import perfectns.results_tables as rt

# Input settings
settings = perfectns.settings.PerfectNSSettings()
settings.likelihood = likelihoods.Gaussian(likelihood_scale=1)
settings.prior = priors.Gaussian(prior_scale=10)
settings.ninit = 10
settings.nlive_const = 100
settings.n_dim = 10
# Run results settings
dynamic_results_table = rt.get_dynamic_results(100, [0, 1], estimator_list, settings, save=False, load=False)
dynamic_results_table[estimator_names]

dynamic_goal=None n_samples_max=None

dynamic_goal=0 n_samples_max=2992

dynamic_goal=1 n_samples_max=3007

[7]:

			$\mathrm{log} \mathcal{Z}$	$\overline{\theta_{\hat{1}}}$	$\overline{\theta^2_{\hat{1}}}$	$\mathrm{median}(\theta_{\hat{1}})$	$\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
calculation type	dynamic settings	result type
mean	standard	value	-32.259097	-0.004415	0.984532	0.001289	0.981203
	standard	uncertainty	0.040170	0.003470	0.004505	0.004431	0.005580
	dynamic $G=0$	value	-32.218098	-0.002052	0.979013	-0.001307	0.978076
	dynamic $G=0$	uncertainty	0.034715	0.004263	0.006417	0.005667	0.007123
	dynamic $G=1$	value	-32.143640	-0.000484	0.992710	-0.001467	0.992193
	dynamic $G=1$	uncertainty	0.113207	0.001923	0.002890	0.002494	0.002868
std	standard	value	0.401704	0.034697	0.045052	0.044311	0.055795
	standard	uncertainty	0.028548	0.002466	0.003202	0.003149	0.003965
	dynamic $G=0$	value	0.347150	0.042629	0.064168	0.056671	0.071234
	dynamic $G=0$	uncertainty	0.024671	0.003029	0.004560	0.004027	0.005062
	dynamic $G=1$	value	1.132066	0.019231	0.028904	0.024941	0.028678
	dynamic $G=1$	uncertainty	0.080452	0.001367	0.002054	0.001772	0.002038
std efficiency gain	dynamic $G=0$	value	1.338991	0.662480	0.492935	0.611363	0.613510
	dynamic $G=0$	uncertainty	0.269147	0.133164	0.099084	0.122889	0.123320
	dynamic $G=1$	value	0.125912	3.255211	2.429505	3.156470	3.785308
	dynamic $G=1$	uncertainty	0.025309	0.654322	0.488349	0.634474	0.760876

Looking at the std efficiency gain rows, you should see that dynamic nested sampling targeted at parameter estimation (dynamic goal=1) has an efficiency gain (equivalent computational speedup) for parameter estimation (columns other than $\log \mathcal{Z}$) of factor of around 3 compared to standard nested sampling.

Similar results tables for different likelihoods can be found in the dynamic nested sampling paper (Higson at al., 2019a). For more information about the get_dynamic_results function look at its documentation.

Comparing bootstrap error estimates to observed distributions of results¶

We can check if the bootstrap estimates of parameter estimation sampling errors are accurate, using a 3d Gaussian likelihood and Gaussian prior.

[8]:

settings.likelihood = likelihoods.Gaussian(likelihood_scale=1)
settings.prior = priors.Gaussian(prior_scale=10)
settings.n_dim = 3
bootstrap_results_table = rt.get_bootstrap_results(50, 50, # 100, 200,
                                                   estimator_list, settings,
                                                   n_run_ci=20,
                                                   n_simulate_ci=200,  # n_simulate_ci=1000,
                                                   add_sim_method=False,
                                                   cred_int=0.95,
                                                   ninit_sep=True,
                                                   parallel=True)
bootstrap_results_table

[8]:

		$\mathrm{log} \mathcal{Z}$	$\overline{\theta_{\hat{1}}}$	$\overline{\theta^2_{\hat{1}}}$	$\mathrm{median}(\theta_{\hat{1}})$	$\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
calculation type	result type
repeats mean	value	-9.625403	0.002816	0.987749	0.007308	0.999308
repeats mean	uncertainty	0.036342	0.007468	0.010398	0.009250	0.012202
repeats std	value	0.256977	0.052807	0.073526	0.065408	0.086281
repeats std	uncertainty	0.025959	0.005334	0.007427	0.006607	0.008716
bs std / repeats std	value	0.892930	0.873771	0.925045	0.972450	0.920943
bs std / repeats std	uncertainty	0.091285	0.089760	0.095384	0.100413	0.096081
bs estimate % variation	value	11.112636	13.206385	14.631458	15.134609	18.444304
bs estimate % variation	uncertainty	1.122546	1.334046	1.478000	1.528826	1.863156
bs 0.95 CI	value	-9.220209	0.070315	1.114863	0.097298	1.127381
bs 0.95 CI	uncertainty	0.044147	0.012680	0.018326	0.016915	0.018304
bs +-1std % coverage	value	72.000000	62.000000	52.000000	62.000000	64.000000
bs 0.95 CI % coverage	value	94.000000	90.000000	96.000000	92.000000	92.000000

Note that every second column gives an estimated numerical uncertainty on the values in the previous column.

You should see that the ratio of the bootstrap error estimates to bootstrap_results the standard deviation of results (row 4 of bootstrap_results_table) has values close to 1 given the estimated numerical uncertainties. Similar results are given in the appendix of the dynamic nested sampling paper (Higson, 2019a); see the paper and the get_bootstrap_results function’s documentation for more details.