Training sets

ecopann.cosmic_params

class ecopann.cosmic_params.ParamsProperty(param_names, params_dict=None)[source]

Bases: object

property labels
property param_fullNames
property params_limit
ecopann.cosmic_params.params_dict_zoo()[source]

Information of cosmological parameters that include the labels and physical limits: [label, limit_min, limit_max]

The label is used to plot figures. The physical limits are used to ensure that the simulated parameters have physical meaning.

Note

If the physical limits of parameters is unknown or there is no physical limits, it should be set to np.nan.

ecopann.data_simulator

class ecopann.data_simulator.AddGaussianNoise(spectra, params=None, obs_errors=None, cholesky_factor=None, noise_type='multiNormal', factor_sigma=0.5, multi_noise=5, use_GPU=True)[source]

Bases: object

Add Gaussian noise for simulated data.

Parameters
  • spectra (torch tensor, or a list of torch tensor) – The simulated spectra (data) with shape (N, spectra_length), or a list of spectra with shape [(N,spectra_length_1), (N,spectra_length_2), …]

  • params (torch tensor or None) – The simulated cosmological parameters. Default: None

  • obs_errors (torch tensor, or a list of torch tensor, optional) – Observational errors (standard deviation) with shape (spectra_length,), or a list of errors with shape [(spectra_length_1,), (spectra_length_2,), …]. Default: None

  • cholesky_factor (torch tensor, a list of torch tensor, or None, optional) – Cholesky factor of covariance matrix with shape (spectra_length, spectra_length), or a list of Cholesky factor of covariance matrix with shape [(spectra_length_1, spectra_length_1), (spectra_length_2, spectra_length_2), …]. Default: None

  • noise_type (str, optional) – The type of Gaussian noise added to the training set, ‘singleNormal’ or ‘multiNormal’. Default: ‘multiNormal’

  • factor_sigma (float, optional) – For the case of ‘singleNormal’, it is the factor of the observational error (standard deviation), while for the case of ‘multiNormal’ it is the standard deviation of the coefficient of the observational error (standard deviation). Default: 0.5

  • multi_noise (int, optional) – The number of realization of noise added to a spectrum. Default: 5

  • use_GPU (bool, optional) – If True, the noise will be generated by GPU, otherwise, it will be generated by CPU. Default: True

multiNoisySample(reorder=True)[source]
multiNoisySpectra()[source]
multiNormalSpectra(factor_sigma=0.5)[source]
multiParams()[source]
noisySample()[source]
noisySpectra()[source]
obs_noise(spectrum, obs_error=None, cholesky_factor=None)[source]
singleNormalSpectra(error_factor=1)[source]
class ecopann.data_simulator.CutParams(param_names, params_dict=None)[source]

Bases: object

Cut parameter samples that crossed the parameter limits.

Parameters
  • param_names (list) – A list that contains parameter names.

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

cut_params(params, params_limit=None)[source]
class ecopann.data_simulator.ParametersFilter(param_names, sim_params, params_space, prev_space, check_include=True, rel_dev_limit=0.2)[source]

Bases: object

Select cosmological parameters from a data set according to a given parameter space.

Parameters
  • param_names (list) – A list that contains parameter names.

  • sim_params (array-like) – The simulated cosmological parameters with the shape of (N, n), where N is the number of samples and n is the number of parameters.

  • params_space (array-like) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • check_include (bool, optional) – If True, it will check whether params_space is in the space of sim_params, otherwise, do nothing. Default: True

  • rel_dev_limit (float, optional) – The limit of the relative deviation when params_space is not in the space of sim_params, the default is 20% (this means if params_space is \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) from sim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2

filter_index()[source]
filter_params()[source]
property include

Check whether params_space is in the space of the sim_params.

Returns

If params_space is in the space of the sim_params, return True, otherwise, return False.

Return type

bool

class ecopann.data_simulator.SimMultiSpectra(branch_n, N, model, param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False, local_samples=None, prevStep_data=None, check_include=True, rel_dev_limit=0.2)[source]

Bases: SimSpectra

Simulate training set containing multiple observations (for multi-branch network).

Parameters
  • branch_n (int) – The number of branch of the network.

  • N (int) – The number of data to be simulated.

  • model (cosmological (or theoretical) model instance) – A cosmological (or theoretical) model instance that is used to simulate training set, it should contains a ‘simulate’ method, and ‘simulate’ should accept input of cosmological parameters, if you use the local data sets, it should also contain ‘load_params’, ‘load_params_space’, and ‘load_sample’ methods.

  • param_names (list) – A list that contains parameter names.

  • chain (array-like or None) – The predicted ANN chain in the previous step. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • params_space (array-like) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

  • space_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’

  • cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True

  • cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False. Default: True

  • cross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False and cut_crossedBest=False. Default: False

  • local_samples (None, str, or list, optional) – Path of local samples, None, ‘sample’ or [‘sample’] or [‘sample_1’, ‘sample_2’, …]. If None, no local samples are used. Default: None

  • prevStep_data (None or list, optional) – Samples simulated in the previous step, if list, it should be [spectra, params]. The spectra or params has shape (N, n), where N is the number of spectra and n is the number of data points in a spectrum. Default: None

  • check_include (bool, optional) – If True, will check whether params_space is in the space of local_samples, otherwise, do nothing. Default: True

  • rel_dev_limit (float, optional) – The limit of the relative deviation when params_space is not in the space of sim_params, the default is 20% (this means if params_space is \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) from sim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2

Variables
  • prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • seed (None or int, optional) – Seed number which controls random draws. Default: None

Note

Either chain or params_space should be given to simulate samples.

save_samples(path='sim_data', branch_paths=['comp1', 'comp2'])[source]
simulate_spectra(N)[source]
class ecopann.data_simulator.SimParameters(param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False)[source]

Bases: CutParams

Simulate parameters.

Parameters
  • param_names (list) – A list that contains parameter names.

  • chain (array-like or None) – The predicted ANN chain in the previous step. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • params_space (array-like or None) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit]. This is only used for space_type=’hypercube’ and space_type=’LHS’ If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

  • space_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’

  • cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True

  • cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, ‘or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False. Default: True

  • cross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False and cut_crossedBest=False. Default: False

Variables

seed (None or int, optional) – Seed number which controls random draws. Default: None

Note

Either chain or params_space should be given to simulate samples.

property combinations
fold_sphere(params)[source]

Fold the simulated parameters using the extremum of the parameters.

https://en.wikipedia.org/wiki/Folded_normal_distribution

get_contour_edges(sigma=3)[source]
get_edge_space(sigma=3)[source]
get_multiParams(N, multi_params=1, use_dataSeed=False, reorder=True)[source]
get_params(N)[source]
hypercube(N)[source]

Generate samples uniformly in a hypercube parameter space using uniform distribution.

Parameters

N (int) – The number of data to be simulated.

Returns

Parameters.

Return type

array-like

hyperellipsoid(N)[source]

Generate samples uniformly in a hyperellipsoid parameter space using covariance between parameters.

https://scipy-cookbook.readthedocs.io/items/CorrelatedRandomSamples.html https://blogs.sas.com/content/iml/2012/02/08/use-the-cholesky-transformation-to-correlate-and-uncorrelate-variables.html

Parameters

N (int) – The number of data to be simulated.

Returns

Parameters.

Return type

array-like

Note

For Cholesky decomposition, the covariance matrix \(C = LL^T\). So, the transformation relationship between correlated parameters \(P_{corr}\) and uncorrelated parameters \(P_{uncorr}\) is \(P_{corr} = LP_{uncorr}\), \(P_{uncorr} = L^{-1}P_{corr}\)

hypersphere(N)[source]

Generate samples uniformly in a hypersphere parameter space.

Parameters

N (int) – The number of data to be simulated.

Returns

Parameters.

Return type

array-like

in_polygon(edge, x, y, get_points=True)[source]

Judge whether the given points are in the area surrounded by the polygon.

Parameters
  • edge (array-like) – 2-D array with shape (N, 2). The vertices of a polygon.

  • x (array-like) – 1-D array with shape (M,). The x coordinate of the data points.

  • y (array-like) – 1-D array with shape (M,). The y coordinate of the data points.

  • get_points (bool, optional) – If True, it will return data points inside the area, if False, it will return a bool array which is True if the (closed) path contains the corresponding point. Default: True

Returns

Points in the polygon.

Return type

array-like

lhs(N)[source]

Generate samples uniformly in a hypercube parameter space using Latin hypercube sampling.

https://en.wikipedia.org/wiki/Latin_hypercube_sampling https://blog.csdn.net/yuxeaotao/article/details/108952326

Parameters

N (int) – The number of data to be simulated.

Returns

Parameters.

Return type

array-like

normal_params(N, best, sigma_max, spaceSigma)[source]
property params_n
posterior_hyperellipsoid(N, factor=<class 'float'>)[source]
random_ball(N, dimension, radius=1)[source]

Generate samples uniformly in a ball with N dimension (hypersphere).

https://www.cnpython.com/qa/349434 https://www.zhihu.com/question/277712372 https://blogs.sas.com/content/iml/2016/04/06/generate-points-uniformly-in-ball.html https://arxiv.org/pdf/1404.1347.pdf https://www.sciencedirect.com/science/article/pii/S0047259X10001211

uniform_params(N, p_space)[source]
unique_elements(list_array)[source]

Find the unique elements of a list which contains various of arrays.

Parameters

list_array (list) – A list that contais various of arrays.

Returns

The sorted unique elements of the list.

Return type

array-like

class ecopann.data_simulator.SimSpectra(N, model, param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False, local_samples=None, prevStep_data=None, check_include=True, rel_dev_limit=0.2)[source]

Bases: SimParameters

Simulate training set.

Parameters
  • N (int) – The number of data to be simulated.

  • model (cosmological (or theoretical) model instance) – A cosmological (or theoretical) model instance that is used to simulate training set, it should contains a ‘simulate’ method, and ‘simulate’ should accept input of cosmological parameters, if you use the local data sets, it should also contain ‘load_params’, ‘load_params_space’, and ‘load_sample’ methods.

  • param_names (list) – A list that contains parameter names.

  • chain (array-like or None) – The predicted ANN chain in the previous step. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • params_space (array-like or None) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit]. This is only used for space_type=’hypercube’ and space_type=’LHS’ If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

  • space_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’

  • cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True

  • cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False. Default: True

  • cross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False and cut_crossedBest=False. Default: False

  • local_samples (None, str, or list, optional) – Path of local samples, None, ‘sample’ or [‘sample’] or [‘sample_1’, ‘sample_2’, …]. If None, no local samples are used. Default: None

  • prevStep_data (None or list, optional) – Samples simulated in the previous step, if list, it should be [spectra, params]. The spectra or params has shape (N, n), where N is the number of spectra and n is the number of data points in a spectrum. Default: None

  • check_include (bool, optional) – If True, will check whether params_space is in the space of local_samples, otherwise, do nothing. Default: True

  • rel_dev_limit (float, optional) – The limit of the relative deviation when params_space is not in the space of sim_params, the default is 20% (this means if params_space is \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) from sim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2

Variables
  • prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • seed (None or int, optional) – Seed number which controls random draws. Default: None

Note

Either chain or params_space should be given to simulate samples.

comb_spectra(spectra_1, spectra_2)[source]
filter_localSample(local_sample, N_local)[source]

Select samples from the local data sets.

Parameters
  • local_sample (str) – Folders of local samples.

  • N_local (int) – The number of local samples to be selected.

Returns

The selected spectra and parameters.

Return type

array-like

Note

Parameter space of the local samples should be in the initial parameter space.

filter_localSamples(N_local)[source]
filter_previousSamples(N_pre)[source]

Select samples from the mock data simulated in the previous step.

Parameters

N_pre (int) – The number of samples to be selected.

Returns

The selected spectra and parameters.

Return type

array-like

save_samples(path='sim_data/sample')[source]
save_samples_2(multi_params=1, path='sim_data/sample', use_dataSeed=False)[source]
save_samples_3(part_size=10, multi_params=1, path='sim_data/sample')[source]
save_samples_3_onePart(params, part_size=10, part_idx=0, path='sim_data/sample')[source]
simulate()[source]
simulate_spectra(N)[source]

ecopann.data_processor

class ecopann.data_processor.InverseNormalize(x1, statistic={}, norm_type='z_score', a=0, b=1)[source]

Bases: object

Inverse transformation of class Normalize.

inverseNorm()[source]
mean()[source]
minmax()[source]
z_score()[source]
class ecopann.data_processor.Normalize(x, statistic={}, norm_type='z_score', a=0, b=1)[source]

Bases: object

Normalize data.

mean()[source]

mean normalization

minmax()[source]

min-max normalization

Rescaling the range of features to scale the range in [0, 1] or [a,b] https://en.wikipedia.org/wiki/Feature_scaling

norm()[source]
z_score()[source]

standardization/z-score/zero-mean normalization

class ecopann.data_processor.ParamsScaling(params_base)[source]

Bases: object

Data preprocessing of cosmological parameters.

Parameters

params_base (array-like) – A 1-D array that contains the base values of the cosmological parameters.

inverseScaling(params)[source]
scaling(params)[source]
class ecopann.data_processor.Statistic(x)[source]

Bases: object

Statistics of an array.

property mean
statistic()[source]
property std
property xmax
property xmin
ecopann.data_processor.cpu2cuda(data)[source]

Transfer data from CPU to GPU.

Parameters

data (array-like or tensor) – Numpy array or torch tensor.

Raises

TypeError – The data type should be np.ndarray or torch.Tensor.

Returns

Torch tensor.

Return type

Tensor

ecopann.data_processor.cuda2numpy(data)[source]

Transfer data from the torch tensor (on GPU) to the numpy array (on CPU).

ecopann.data_processor.cuda2torch(data)[source]

Transfer data (torch tensor) from GPU to CPU.

ecopann.data_processor.numpy2cuda(data, device=None)[source]

Transfer data from the numpy array (on CPU) to the torch tensor (on GPU).

ecopann.data_processor.numpy2torch(data)[source]

Transfer data from the numpy array (on CPU) to the torch tensor (on CPU).

ecopann.data_processor.torch2cuda(data, device=None)[source]

Transfer data (torch tensor) from CPU to GPU.

ecopann.data_processor.torch2numpy(data)[source]

Transfer data from the torch tensor (on CPU) to the numpy array (on CPU).