sandy.core.samples module

class sandy.core.samples.Samples(df, *args, **kwargs)

Bases: object

Container for samples.

Attributes:

data: Dataframe of samples.

Methods

`get_condition_number`()	Return condition number of samples.
`iterate_xs_samples`()	Iterate samples one by one and shape them as a `sandy.Xs()` dataframe, but with mutligroup structure.
`test_shapiro`([size, pdf])	Perform the Shapiro-Wilk test for normality on the samples.

get_corr	Return correlation matrix of samples.
get_cov	Return covariance matrix of samples.
get_mean	Return mean vector of samples.
get_std	Return standard deviation vector of samples.
get_rstd	Return relative standard deviation vector of samples.

property data

Dataframe of samples.

Returns:

pandas.DataFrame: tabulated samples

Attributes:

indexpandas.Index or pandas.MultiIndex: indices
columnspandas.Index: samples numbering
valuesnumpy.array: sample values as float

get_condition_number()

Return condition number of samples.

Notes

..note:: the condition number can help assess multicollinearity.

get_corr()

get_cov()

get_mean()

get_rstd()

get_std()

iterate_xs_samples()

Iterate samples one by one and shape them as a sandy.Xs() dataframe, but with mutligroup structure. This output should be passed to sandy.Xs._perturb(). The function is called by sandy.Endf6.apply_perturbations()

Yields:

nint

.

spd.DataFrame

dataframe of perturbation coefficients with:

columns: pd.MultiIndex with levels “MAT” and “MT”

index: pd.IntervalIndex with multigroup structure

Notes

If samples refer to redundant MT number, the same identical samples are passed one level down to the partial MT components. For instance:

MT=4 samples will be assigned to MT=50-91

MT=1 samples will be assigned to MT=2 and MT=3

MT=18 samples will be assigned to MT=19-21 and MT=38

..important:: Assigning samples from redundant MT number to partial: components only applies if the partial components do not have their own samples, and it only goes one level deep.

Examples

Get samples fot MT=1 >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 10010) >>> smps1 = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=1)))[33]

Copy samples each time to a redundant or partial MT >>> smps3 = sandy.Samples(smps1.data.reset_index().assign(MT=3).set_index([“MAT”, “MT”, “E”])) >>> smps18 = sandy.Samples(smps1.data.reset_index().assign(MT=18).set_index([“MAT”, “MT”, “E”])) >>> smps19 = sandy.Samples(smps1.data.reset_index().assign(MT=19).set_index([“MAT”, “MT”, “E”])) >>> smps27 = sandy.Samples(smps1.data.reset_index().assign(MT=27).set_index([“MAT”, “MT”, “E”])) >>> smps4 = sandy.Samples(smps1.data.reset_index().assign(MT=4).set_index([“MAT”, “MT”, “E”])) >>> smps51 = sandy.Samples(smps1.data.reset_index().assign(MT=51).set_index([“MAT”, “MT”, “E”])) >>> smps101 = sandy.Samples(smps1.data.reset_index().assign(MT=101).set_index([“MAT”, “MT”, “E”])) >>> smps452 = sandy.Samples(smps1.data.reset_index().assign(MT=452).set_index([“MAT”, “MT”, “E”]))

Check that samples are passed correctly to daughter MTs (only one level deep) >>> expected = pd.MultiIndex.from_product([[125], [51]], names=[“MAT”, “MT”]) >>> assert next(smps51.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [4] + list(sandy.redundant_xs[4])], names=["MAT", "MT"])
>>> assert next(smps4.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [1] + list(sandy.redundant_xs[1])], names=["MAT", "MT"])
>>> assert next(smps1.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [3] + list(sandy.redundant_xs[3])], names=["MAT", "MT"])
>>> assert next(smps3.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [1] + list(sandy.redundant_xs[1])], names=["MAT", "MT"])
>>> assert next(smps1.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [18] + list(sandy.redundant_xs[18])], names=["MAT", "MT"])
>>> assert next(smps18.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [27] + list(sandy.redundant_xs[27])], names=["MAT", "MT"])
>>> assert next(smps27.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [101] + list(sandy.redundant_xs[101])], names=["MAT", "MT"])
>>> assert next(smps101.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [452] + list(sandy.redundant_xs[452])], names=["MAT", "MT"])
>>> assert next(smps452.iterate_xs_samples())[1].columns.equals(expected)

In this example the original covariance contains data for MT=1 and MT=51. >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 942400) >>> smps = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=[1, 51])))[33]

Then, since MT=1 is redundant, samples are passed to its partial components (MT=2 and MT=3). >>> expected = pd.MultiIndex.from_product([[9440], [1, 51] + list(sandy.redundant_xs[1])], names=[“MAT”, “MT”]) >>> assert next(smps.iterate_xs_samples())[1].columns.equals(expected)

If case one of the partial components already has samples, i.e., MT=2… >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 942400) >>> smps = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=[1, 2, 51])))[33]

Then the MT=1 samples are not passed to the partial components, which in this case it means that MT=2 is not changed and MT=3 is not created. >>> expected = pd.MultiIndex.from_product([[9440], [1, 2, 51]], names=[“MAT”, “MT”]) >>> assert next(smps.iterate_xs_samples())[1].columns.equals(expected)

test_shapiro(size=None, pdf='normal')

Perform the Shapiro-Wilk test for normality on the samples. The test can be performed also for a lognormal distribution by testing for normality the logarithm of the samples.

The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution.

Parameters:

sizeint, optional: number of samples (starting from the first) that need to be considered for the test. The default is None, i.e., all samples.
pdfstr, optional: the pdf used to test the samples. Either “normal” or “lognormal”. The default is “normal”.

Returns:

pd.DataFrame: Dataframe with Shapriro-Wilk results (statistic and pvalue) for each variable considered in the Samples() instance.

Examples

Generate 5000 xs samples normally, log-normally and uniform distributed >>> tape = sandy.get_endf6_file(“jeff_33”, “xs”, 10010) >>> njoy_kws = dict(err=1, errorr33_kws=dict(mt=102)) >>> nsmp = 5000 >>> seed = 5 >>> >>> smp_norm = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”normal”))[33] >>> smp_lognorm = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”lognormal”))[33] >>> smp_uniform = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”uniform”))[33]

In this example we defined the following arbitrary convergence criteria:

if the p value is larger than 0.05 we fail to reject the null-hypothesis and we accept the results
if the first condition is accepted, we confirm the pdf if the statistics is larger than 0.95

>>> threshold = 0.95
>>> pthreshold = 0.05
>>> def test(smps):
...     data = []
...     for n in [10, 50, 100, 500, 1000, 5000]:
...         for pdf in ("normal", "lognormal"):
...             df = smps.test_shapiro(pdf=pdf, size=n)
...             idx = df.statistic.idxmin()
...             w = df.loc[idx]
...             t = "reject" if w.pvalue < pthreshold else (pdf if w.statistic > threshold else "reject")
...             data.append({"PDF": pdf, "test":t, "# SMP": n})
...     df = pd.DataFrame(data).pivot_table(index="# SMP", columns="PDF", values="test", aggfunc=lambda x: ' '.join(x))
...     return df

The Shapiro-Wilks test proves wrong the normal samples because of the tail truncation. # >>> print(test(smp_norm)) PDF lognormal normal # SMP 10 reject reject 50 reject reject 100 reject reject 500 reject reject 1000 reject reject 5000 reject reject

The Shapiro-Wilks test proves right for the lognormal samples and the lognormal distribution. # >>> print(test(smp_lognorm)) PDF lognormal normal # SMP 10 lognormal reject 50 lognormal reject 100 lognormal reject 500 lognormal reject 1000 lognormal reject 5000 lognormal reject

The Shapiro-Wilks gives too low p-values for the uniform samples. # >>> print(test(smp_uniform)) PDF lognormal normal # SMP 10 reject reject 50 reject reject 100 reject reject 500 reject reject 1000 reject reject 5000 reject reject