Parameter estimation

ecopann.ann

class ecopann.ann.ANN(obs_data, model, param_names, params_dict=None, cov_matrix=None, init_chain=None, init_params=None, hidden_layer=3, branch_hiddenLayer=2, trunk_hiddenLayer=1, epoch=2000, epoch_branch=2000, num_train=3000, num_vali=500, spaceSigma=5, space_type='hyperellipsoid', local_samples=None, stepStop_n=3)[source]

Bases: PlotPosterior

Estimating (cosmological) parameters with Artificial Neural Network.

Parameters
  • obs_data (array-like or list) – The observational spectra (data) with shape (spectra_length,3), or a list of spectra with shape [(spectra_length_1,3), (spectra_length_2,3), …]. The first column is the observational variable, the second column is the best values of the measurement, and the third column is the error of the measurement.

  • model (cosmological (or theoretical) model instance) – A cosmological (or theoretical) model instance that is used to simulate training set, it should contains a ‘simulate’ method, and ‘simulate’ should accept input of cosmological parameters, if you use local data sets, it should also contain ‘load_params’ and ‘load_sample’ methods.

  • param_names (list) – A list which contains the parameter names, e.g. [‘H0’,’Omega_m’,’ombh2’,’omch2’,’tau’,’As’,’ns’].

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

  • cov_matrix (array-like, list, or None, optional) – Covariance matrix of the observational data. It should be an array with shape (spectra_length, spectra_length), or a list of covariance matrix with shape [(spectra_length_1, spectra_length_1), (spectra_length_2, spectra_length_2), …]. If there is no covariance for some observations, the covariance matrix should be set to None. e.g. [cov_matrix_1, None, cov_matrix_3]. Default: None

  • init_chain (array-like, optional) – The initial ANN or MCMC chain, which is usually based on prvious parameter estimation. Default: None

  • init_params (array-like, optional) – The initial settings of the parameter space. If init_chain is given, init_params will be ignored. Default: None

  • hidden_layer (int, optional) – The number of the hidden layer of the network (for a single branch network). Default: 3

  • branch_hiddenLayer (int, optional) – The number of the hidden layer for the branch part of the network (for a multibranch network). Default: 2

  • trunk_hiddenLayer (int, optional) – The number of the hidden layer for the trunk part of the network (for a multibranch network). Default: 1

  • epoch (int, optional) – The number of epoch of the training process. Default: 2000

  • epoch_branch (int, optional) – The number of epoch of the training process (for the branch part of the multibranch network). Default: 2000

  • num_train (int, optional) – The number of samples of the training set. Default: 3000

  • num_vali (int, optional) – The number of samples of the validation set. Default: 500

  • spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5

  • space_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hyperellipsoid’

  • local_samples (None, str, or list, optional) – Path of local samples, None, or ‘sample’ or [‘sample’] or [‘sample_1’, ‘sample_2’, …]. If None, no local samples are used. Default: None

  • stepStop_n (int, optional) – If the number of steps after burn-in reached stepStop_n, it will stop the whole training process. This only works after burn_in. Default: 3

Variables
  • activation_func (str, optional) – Activation function, which can be ‘relu’, ‘leakyrelu’, ‘prelu’, ‘rrelu’, ‘relu6’, ‘elu’, ‘celu’, ‘selu’, ‘silu’, ‘sigmoid’, ‘logsigmoid’, ‘tanh’, ‘tanhshrink’, ‘softsign’, or ‘softplus’ (see activation()). Default: ‘rrelu’

  • lr (float, optional) – The learning rate setting of the network. Default: 1e-2

  • lr_min (float, optional) – The minimum of the learning rate. Default: 1e-8

  • batch_size (int, optional) – The batch size setting of the network. Default: 750

  • auto_batchSize (bool, optional) – If True, the batch size will be set automatically in the training process, otherwise, use the setting of batch_size. Default: True

  • auto_epoch (bool, optional) – If True, the epoch will be set automatically in the training process, otherwise, use the setting of epoch. Default: True

  • base_N_max (int, optional) – The maximum value of the basic (or minimum) number of data to be simulated, which works only when auto_N is set to True. Default: 1500

  • auto_N (bool, optional) – If True, the number of samples in the training set will be set automatically, otherwise, use the setting of num_train. Default: True

  • noise_type (str, optional) – The type of Gaussian noise added to the training set, ‘singleNormal’ or ‘multiNormal’. Default: ‘multiNormal’

  • factor_sigma (float, optional) – For the case of ‘singleNormal’, it is the factor of the observational error (standard deviation), while for the case of ‘multiNormal’ it is the standard deviation of the coefficient of the observational error (standard deviation). Default: 0.2

  • multi_noise (int, optional) – The number of realization of noise added to a spectrum. Default: 5

  • scale_spectra (bool, optional) – If True, the input data (measurements) will be scaled based on the base values of the data. It is recommended to set to True. Default: True

  • scale_params (bool, optional) – If True, the target data (cosmological parameters) will be scaled based on the base values of parameters. See ParamsScaling. It is recommended to set to True. Default: True

  • norm_inputs (bool, optional) – If True, the input data of the network will be normalized. Default: True

  • norm_target (bool, optional) – If True, the target data (cosmological parameters) will be normalized. Default: True

  • norm_type (str, optional) – The method of normalization, ‘z_score’, ‘minmax’, or ‘mean’ (see Normalize). Default: ‘z_score’

  • set_numpySeed (bool, optional) – If True, a fixed random seed that works for numpy will be set before training the network. Default: True

  • set_torchSeed (bool, optional) – If True, a fixed random seed that works for PyTorch will be set before training the network. Default: True

  • train_branch (bool, optional) – If True, the branch part of the multibranch network will be trained before training the entire network. Default: False

  • repeat_n (int, optional) – The number of iterations using the same batch of data during network training, which is usually set to 1 or 3. Default: 3

  • expectedBurnIn_step (int, optional) – The expected burn-in step number. Default: 10

  • chain_leng (int, optional) – The number of samples to be generated by a network model when predicting ANN chain, which is equal to the length of the ANN chain. Default: 10000

Note

The number of samples of the training set should be large enough to ensure the network learns a reliable mapping. For example, set num_train to 3000, or a larger value like 4000 or 5000.

The epoch should also be set large enough to ensure a well-learned network. e.g. set epoch to 2000, or a larger value like 3000, 4000, or 5000.

The initial parameter space is suggested to set large enough to cover the true parameters. In this case, it be easier for the network to find the best-fit value of parameters.

It is better to set step number a large value like 10, and this will minimize the effect of randomness on the results. However, it is also acceptable to set a smaller value like 5, because burn-in will be reached quickly (usually no more than 2 steps). The advantage of this method is that we can analyze the results before the end of the training process, and determine how many steps can be used to estimate parameters.

Local samples can be used as training set to save time, so when using this method, you can generate a sample library for later reuse.

property base_N
property base_epoch
property chain_ann

Combined ANN chain using the result of steps after burn-in.

property chains_good
property cov_copy
property obs_dtype
property obs_errors
property obs_variables
property param_labels
print_hparams()[source]
save_variables(sample=None)[source]
simulate(step=1, burn_in=False, burnIn_step=None, space_type_all=[], prev_space=None, chain_all=[], sim_spectra=None, sim_params=None)[source]

Simulate data and update parameter space.

train(path='ann', sample=None, save_items=True, showIter_n=100)[source]

Train the network and save the results.

Parameters
  • path (str, optional) – The path of the results to be saved. Default: ‘ann’

  • sample (str or None, optional) – Symbol mark of observational data or measurements. Default: None

  • save_items (bool, optional) – If True, results will be saved to disk, otherwise, results will not be saved

  • showIter_n (int, optional) – The number of iterations interval for printing. Default: 100

Returns

A list of chains.

Return type

list

class ecopann.ann.RePredict(obs_data, cov_matrix=None, path='ann', randn_num='', steps_n=10, params_dict=None)[source]

Bases: PlotPosterior

Reanalysis using the saved chains or the well-trained networks.

Parameters
  • obs_data (array-like or list) – The observational spectra (data) with shape (spectra_length,3), or a list of spectra with shape [(spectra_length_1,3), (spectra_length_2,3), …]. The first column is the observational variable, the second column is the best values of the measurement, and the third column is the error of the measurement.

  • cov_matrix (array-like, list, or None, optional) – Covariance matrix of the observational data. It should be an array with shape (spectra_length, spectra_length), or a list of covariance matrix with shape [(spectra_length_1, spectra_length_1), (spectra_length_2, spectra_length_2), …]. If there is no covariance for some observations, the covariance matrix should be set to None. e.g. [cov_matrix_1, None, cov_matrix_3]. Default: None

  • path (str, optional) – The path of the results saved. Default: ‘ann’

  • randn_num (str or int, optional) – A random number that identifies the saved results. Default: ‘’

  • steps_n (int, optional) – The number of steps of the training process. Default: 10

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo. Default: None

Variables

chain_leng (int, optional) – The number of samples to be generated by a network model when predicting ANN chain, which is equal to the length of the ANN chain. Default: 10000

property best_fit
property burnIn_step
property chain_ann

Combined ANN chain using the result of steps after burn-in.

property cov_copy
from_chain()[source]

Predict using saved chains.

Raises

ValueError – If variables of the input observational data are different from those used to train the network, an error will be raised.

from_net()[source]

Predict using saved networks.

Raises

ValueError – If variables of the input observational data are different from those used to train the network, an error will be raised.

property obs_variables
property param_labels
property same_variables
property trained_variables

ecopann.space_updater

class ecopann.space_updater.Chains[source]

Bases: object

static bestFit(chain, best_type='mode', out_sigma=1, symmetry_error=True)[source]

Get the best-fit parameters from the chain.

Parameters
  • chain (array-like) – The ANN chain.

  • best_type (str, optional) – The type of the best values of parameters, ‘mode’ or ‘median’. If ‘mode’, it will take the mode as the best value. If ‘median’, it will take the median as the best value. Default: ‘mode’

  • out_sigma (int) – The output sigma, which can be 1, 2, or 3. Default: 1

  • symmetry_error (bool, optional) – If True, obtain symmetrical errors, otherwise, obtain unsymmetrical errors. Default: True

static best_median(chain)[source]

Take the median as the best value.

static best_mode(chain, bins=100, smooth=5)[source]

Take the mode as the best value.

static cov_matrix(chain, max_error=True, expand_factor=0)[source]

Get the covariance matrix.

Parameters
  • chain (array-like) – The ANN chain.

  • max_error (bool, optional) – If True, the diagonal elements of the covariance matrix will be replaced by the estimated maximum errors, which is useful for non-Gaussian distribution. Default: True

  • expand_factor (float, optional) – The expansion factor that is used to expand the error (the standard deviation) of each cosmological parameter. For example, if expand_factor=0.05, the error will has 5% expansion. It only works when max_error is True. Default: 0

Returns

cov – The covariance matrix.

Return type

array-like

static error_devs(chain_1, chain_true)[source]

Get the absolute values of the relative deviations of error of parameters obtained from two chains.

static param_devs(chain_1, chain_2)[source]

Get deviations of parameters obtained from two chains.

static params_n(chain)[source]
static reshape_chain(chain)[source]
static sigma(chain, best_values, out_sigma=1)[source]

Calculate the standard deviations.

Parameters
  • chain (array-like) – The ANN chain.

  • best_values (1-dimension array) – The best values of parameters.

  • out_sigma (int) – The output sigma, which can be 1, 2, or 3. Default: 1

Returns

  • sigma_1l, sigma_2l, sigma_3l (1-dimension array) – The left 1 sigma, 2 sigma, or 3sigma deviations.

  • sigma_1r, sigma_2r, sigma_3r (1-dimension array) – The right 1 sigma, 2 sigma, or 3sigma deviations.

class ecopann.space_updater.CheckParameterSpace[source]

Bases: object

static check_limit(p_space, limit_space)[source]

Check the parameter space to ensure that the parameter space does not exceed its limit range.

Parameters
  • p_space (array-like) – The parameter space to be checked.

  • limit_space (array-like) – The limit range of parameter space.

Returns

A parameter space being limited by its limit range.

Return type

array-like

class ecopann.space_updater.UpdateParameterSpace(step, param_names, chain_1, chain_0=None, init_params=None, spaceSigma=5, params_dict=None)[source]

Bases: CheckParameterSpace

Update parameter space.

Parameters
  • step (int) – The number of step in the training process.

  • param_names (list) – A list that contains parameter names.

  • chain_1 (array-like) – The ANN chain of the i-th step, where \(i\geq2\).

  • chain_0 (None or array-like, optional) – The ANN chain of the (i-1)-th step, where \(i\geq2\), if step \(\leq2\), chain_0 should be set to None, otherwise, chain_0 should be an array. Default: None

  • init_params (None or array-like) – The initial settings of the parameter space. If chain_0 is given, init_params will be ignored. Default: None

  • spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

property error_devs
limited_spaceSigma_all()[source]
property param_devs
params_space()[source]

Obtain the parameter space to be learned from chain.

Returns

Limited parameter space.

Return type

array-like

print_learningRange()[source]
small_dev(limit_dev=0.01)[source]

A small value of deviation of parameters between two steps used to end the training process.

Parameters

limit_dev (float, optional) – If the deviation of parameters between two steps smaller than this value, the training process will be over. Default: 0.01 (the deviation < 1%)

Returns

True (dev \(\leq\) limit_dev) or False (dev>limit_dev)

Return type

bool

property spaceSigma_all
property spaceSigma_max
property spaceSigma_min
ecopann.space_updater.get_CovMatrix(chain, params_n, best_values=None)[source]

Calculate covariance matrix from a chain.

Parameters
  • chain (array-like) – The ANN or MCMC chain with shape (N, M), where N is the number of chain and M is the number of parameters.

  • params_n (array-like) – The number of parameters.

  • best_values (array-like) – The best-fit values.

Returns

cov_matrix – Covariance matrix.

Return type

array-like

ecopann.space_updater.get_cov(X, Y, mean=None)[source]

Calculate covariance or variance.

Parameters
  • X (array-like) – Random variable X.

  • Y (array-like) – Random variable Y.

  • mean (array-like or list, optional) – The mean values of X and Y. Default: None

Returns

Covariance or variance.

Return type

float

ecopann.space_updater.pdf_1(X, bins=100, smooth=5)[source]

Estimate the probability density function for the given data.

ecopann.evaluate

class ecopann.evaluate.FilePath(filedir='cnn', randn_num='', suffix='.pt', separator='_')[source]

Bases: object

filePath()[source]
ecopann.evaluate.predict(net, inputs, use_GPU=False, in_type='numpy')[source]

Make predictions using a well-trained network.

Parameters
  • inputs (numpy array or torch tensor) – The inputs of the network.

  • use_GPU (bool) – If True, calculate using GPU, otherwise, calculate using CPU.

  • in_type (str) – The data type of the inputs, ‘numpy’ or ‘torch’.

ecopann.plotter

class ecopann.plotter.BestFitsData(chain_all, chain_ann, param_labels='', burnIn_step=None)[source]

Bases: object

property bestFits_all
property best_fit
panel(data)[source]
panel_data(p_index)[source]
panels_data()[source]
class ecopann.plotter.BestPredictedData(params_testSet, predParams_testSet, params_trainingSet=None, predParams_trainingSet=None, param_labels='', show_reErr=True)[source]

Bases: object

panel(data)[source]
panel_data(p_index)[source]
panels_data()[source]
class ecopann.plotter.PlotPosterior(chain_all, chain_ann, param_names, params_dict=None, burnIn_step=None, randn_num='', path='ann')[source]

Bases: object

plot_contours(bins=100, smooth=5, show_index=None, fill_contours=True, sigma=2, show_titles=True, line_width=2, lims=None)[source]
plot_steps(layout_adjust=[0.3, 0.25], suptitle='')[source]
property randn_suffix
save_contours()[source]
save_steps()[source]
class ecopann.plotter.PlotPrediction(params_testSet, predParams_testSet, param_names, params_trainingSet=None, predParams_trainingSet=None, params_dict=None, show_reErr=True, randn_num='', path='ann')[source]

Bases: object

plot(layout_adjust=[0.3, 0.25], suptitle='')[source]
property randn_suffix
save_fig()[source]
ecopann.plotter.pcc(x, y)[source]

Pearson correlation coefficient https://en.wikipedia.org/wiki/Pearson_correlation_coefficient