Demo

This notebook demonstrates the basic functionality of the perfectns package; for background see the dynamic nested sampling paper (Higson at al., 2019a).

Running nested sampling calculations

The likelihood \(\mathcal{L}(\theta)\), prior \(\pi(\theta)\) and calculation settings are specified in a PerfectNSSettings object. For this example we will use a 10-dimensional spherically symmetric Gaussian likelihood with size \(\sigma_\mathcal{L}=1\) and a Gaussian prior with size \(\sigma_{\pi}=10\).

[1]:
import perfectns.settings
import perfectns.likelihoods as likelihoods
import perfectns.priors as priors

# Input settings
settings = perfectns.settings.PerfectNSSettings()
settings.likelihood = likelihoods.Gaussian(likelihood_scale=1)
settings.prior = priors.Gaussian(prior_scale=10)
settings.n_dim = 10
settings.ninit = 10
settings.nlive_const = 100

The “dynamic_goal” setting determines if dynamic nested sampling should be used and, if so, how to split the computational effort between increasing parameter estimation accuracy and evidence calculation accuracy. dynamic_goal=1 optimises purely for parameter estimation and dynamic_goal=0 optimises purely for calculating the Bayesian evidence \(\mathcal{Z}\).

Lets try running standard nested sampling and dynamic nested sampling calculation:

[2]:
import perfectns.nested_sampling as nested_sampling

# Perform standard nested sampling
settings.dynamic_goal = None
standard_ns_run = nested_sampling.generate_ns_run(settings, random_seed=0)  # set random_seed for reproducible results
# Perform dynamic nested sampling
settings.dynamic_goal = 1  # optimise for parameter estimation accuracy
dynamic_ns_run = nested_sampling.generate_ns_run(settings, random_seed=0)  # set random_seed for reproducible results

We can now make posterior inferences using the samples generated by the nested sampling calculations using the utility functions from nestcheck. Here we calculate:

  1. the log Bayesian evidence \(\log \mathcal{Z}=\log \left( \int \mathcal{L}(\theta) \pi(\theta) \mathrm{d}\theta \right)\),

  2. the mean of the first parameter \(\theta_1\),

  3. the second moment of the posterior distribution of \(\theta_1\),

  4. the median of \(\theta_1\),

  5. the 84% one-tailed credible interval on \(\theta_1\).

For the Gaussian likelihood and prior we can calculate the posterior distribution analytically, so we first calculate the analytic values of each quantity for comparison. The results are displayed in a pandas DataFrame.

[3]:
import perfectns.estimators as e
import nestcheck.ns_run_utils
import pandas as pd

estimator_list = [e.LogZ(),
                  e.ParamMean(),
                  e.ParamSquaredMean(),
                  e.ParamCred(0.5),
                  e.ParamCred(0.84)]
estimator_names = [est.latex_name for est in estimator_list]
results = pd.DataFrame([nestcheck.ns_run_utils.run_estimators(standard_ns_run, estimator_list),
                        nestcheck.ns_run_utils.run_estimators(dynamic_ns_run, estimator_list)],
                       columns=estimator_names, index=['standard run', 'dynamic run'])
# Add true values for comparison
results.loc['true values'] = e.get_true_estimator_values(estimator_list, settings)
results
[3]:
$\mathrm{log} \mathcal{Z}$ $\overline{\theta_{\hat{1}}}$ $\overline{\theta^2_{\hat{1}}}$ $\mathrm{median}(\theta_{\hat{1}})$ $\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
standard run -32.289475 -0.029820 0.971677 -0.046113 0.945738
dynamic run -33.053897 -0.054664 0.931194 -0.093203 0.913430
true values -32.264988 0.000000 0.990080 0.000000 0.989523

Estimating sampling errors

You can estimate the numerical uncertainties on these results by calculating the standard deviation of the sampling errors distributions each run using the bootstrap resampling approach described in Higson et al. (2018) (implemented in nestcheck).

[5]:
import numpy as np
import nestcheck.error_analysis
np.random.seed(0)
results.loc['standard unc'] = nestcheck.error_analysis.run_std_bootstrap(
    standard_ns_run, estimator_list, n_simulate=200)
results.loc['dynamic unc'] = nestcheck.error_analysis.run_std_bootstrap(
    dynamic_ns_run, estimator_list, n_simulate=200)
results.loc[['standard unc', 'dynamic unc']]
[5]:
$\mathrm{log} \mathcal{Z}$ $\overline{\theta_{\hat{1}}}$ $\overline{\theta^2_{\hat{1}}}$ $\mathrm{median}(\theta_{\hat{1}})$ $\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
standard unc 0.366472 0.031321 0.049564 0.043201 0.050531
dynamic unc 1.204710 0.017033 0.030143 0.020610 0.027900

This approach works for both dynamic and standard nested sampling. In addition as perfectns can perform the nested sampling algorithm “perfectly” there are no additional errors from implementation-specific effects such as correlated samples (see Higson et al., 2019b for a detailed discussion).

Generating and analysing runs in parallel

Multiple nested sampling runs can be generated and analysed in parallel (using parallel_utils from nestcheck).

[6]:
import numpy as np
import nestcheck.parallel_utils as pu
import nestcheck.pandas_functions as pf

# Generate 100 nested sampling runs
run_list = nested_sampling.get_run_data(settings, 100, save=False, load=False, random_seeds=list(range(100)))
# Calculate posterior inferences for each run
values = pu.parallel_apply(nestcheck.ns_run_utils.run_estimators, run_list,
                           func_args=(estimator_list,))
# Show the mean and standard deviation of the calculation results
multi_run_tests = pf.summary_df_from_list(values, estimator_names)
multi_run_tests


[6]:
$\mathrm{log} \mathcal{Z}$ $\overline{\theta_{\hat{1}}}$ $\overline{\theta^2_{\hat{1}}}$ $\mathrm{median}(\theta_{\hat{1}})$ $\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
calculation type result type
mean value -32.139762 -0.000028 0.992171 -0.001141 0.992247
uncertainty 0.114168 0.001896 0.002937 0.002478 0.002826
std value 1.141684 0.018956 0.029367 0.024777 0.028265
uncertainty 0.081136 0.001347 0.002087 0.001761 0.002009

Comparing dynamic and standard nested sampling performance

Lets now compare the performance of dynamic and standard nested sampling, using the 10-dimensional Gaussian likelihood and prior.

[7]:
import perfectns.results_tables as rt

# Input settings
settings = perfectns.settings.PerfectNSSettings()
settings.likelihood = likelihoods.Gaussian(likelihood_scale=1)
settings.prior = priors.Gaussian(prior_scale=10)
settings.ninit = 10
settings.nlive_const = 100
settings.n_dim = 10
# Run results settings
dynamic_results_table = rt.get_dynamic_results(100, [0, 1], estimator_list, settings, save=False, load=False)
dynamic_results_table[estimator_names]
dynamic_goal=None n_samples_max=None

dynamic_goal=0 n_samples_max=2992

dynamic_goal=1 n_samples_max=3007


[7]:
$\mathrm{log} \mathcal{Z}$ $\overline{\theta_{\hat{1}}}$ $\overline{\theta^2_{\hat{1}}}$ $\mathrm{median}(\theta_{\hat{1}})$ $\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
calculation type dynamic settings result type
mean standard value -32.259097 -0.004415 0.984532 0.001289 0.981203
uncertainty 0.040170 0.003470 0.004505 0.004431 0.005580
dynamic $G=0$ value -32.218098 -0.002052 0.979013 -0.001307 0.978076
uncertainty 0.034715 0.004263 0.006417 0.005667 0.007123
dynamic $G=1$ value -32.143640 -0.000484 0.992710 -0.001467 0.992193
uncertainty 0.113207 0.001923 0.002890 0.002494 0.002868
std standard value 0.401704 0.034697 0.045052 0.044311 0.055795
uncertainty 0.028548 0.002466 0.003202 0.003149 0.003965
dynamic $G=0$ value 0.347150 0.042629 0.064168 0.056671 0.071234
uncertainty 0.024671 0.003029 0.004560 0.004027 0.005062
dynamic $G=1$ value 1.132066 0.019231 0.028904 0.024941 0.028678
uncertainty 0.080452 0.001367 0.002054 0.001772 0.002038
std efficiency gain dynamic $G=0$ value 1.338991 0.662480 0.492935 0.611363 0.613510
uncertainty 0.269147 0.133164 0.099084 0.122889 0.123320
dynamic $G=1$ value 0.125912 3.255211 2.429505 3.156470 3.785308
uncertainty 0.025309 0.654322 0.488349 0.634474 0.760876

Looking at the std efficiency gain rows, you should see that dynamic nested sampling targeted at parameter estimation (dynamic goal=1) has an efficiency gain (equivalent computational speedup) for parameter estimation (columns other than \(\log \mathcal{Z}\)) of factor of around 3 compared to standard nested sampling.

Similar results tables for different likelihoods can be found in the dynamic nested sampling paper (Higson at al., 2019a). For more information about the get_dynamic_results function look at its documentation.

Comparing bootstrap error estimates to observed distributions of results

We can check if the bootstrap estimates of parameter estimation sampling errors are accurate, using a 3d Gaussian likelihood and Gaussian prior.

[8]:
settings.likelihood = likelihoods.Gaussian(likelihood_scale=1)
settings.prior = priors.Gaussian(prior_scale=10)
settings.n_dim = 3
bootstrap_results_table = rt.get_bootstrap_results(50, 50, # 100, 200,
                                                   estimator_list, settings,
                                                   n_run_ci=20,
                                                   n_simulate_ci=200,  # n_simulate_ci=1000,
                                                   add_sim_method=False,
                                                   cred_int=0.95,
                                                   ninit_sep=True,
                                                   parallel=True)
bootstrap_results_table




[8]:
$\mathrm{log} \mathcal{Z}$ $\overline{\theta_{\hat{1}}}$ $\overline{\theta^2_{\hat{1}}}$ $\mathrm{median}(\theta_{\hat{1}})$ $\mathrm{C.I.}_{84\%}(\theta_{\hat{1}})$
calculation type result type
repeats mean value -9.625403 0.002816 0.987749 0.007308 0.999308
uncertainty 0.036342 0.007468 0.010398 0.009250 0.012202
repeats std value 0.256977 0.052807 0.073526 0.065408 0.086281
uncertainty 0.025959 0.005334 0.007427 0.006607 0.008716
bs std / repeats std value 0.892930 0.873771 0.925045 0.972450 0.920943
uncertainty 0.091285 0.089760 0.095384 0.100413 0.096081
bs estimate % variation value 11.112636 13.206385 14.631458 15.134609 18.444304
uncertainty 1.122546 1.334046 1.478000 1.528826 1.863156
bs 0.95 CI value -9.220209 0.070315 1.114863 0.097298 1.127381
uncertainty 0.044147 0.012680 0.018326 0.016915 0.018304
bs +-1std % coverage value 72.000000 62.000000 52.000000 62.000000 64.000000
bs 0.95 CI % coverage value 94.000000 90.000000 96.000000 92.000000 92.000000

Note that every second column gives an estimated numerical uncertainty on the values in the previous column.

You should see that the ratio of the bootstrap error estimates to bootstrap_results the standard deviation of results (row 4 of bootstrap_results_table) has values close to 1 given the estimated numerical uncertainties. Similar results are given in the appendix of the dynamic nested sampling paper (Higson, 2019a); see the paper and the get_bootstrap_results function’s documentation for more details.