sandy.core.samples module

class sandy.core.samples.Samples(df, *args, **kwargs)

Bases: object

Container for samples.

Attributes:
data

Dataframe of samples.

Methods

get_condition_number()

Return condition number of samples.

iterate_xs_samples()

Iterate samples one by one and shape them as a sandy.Xs() dataframe, but with mutligroup structure.

test_shapiro([size, pdf])

Perform the Shapiro-Wilk test for normality on the samples.

get_corr

Return correlation matrix of samples.

get_cov

Return covariance matrix of samples.

get_mean

Return mean vector of samples.

get_std

Return standard deviation vector of samples.

get_rstd

Return relative standard deviation vector of samples.

property data

Dataframe of samples.

Returns:
pandas.DataFrame

tabulated samples

Attributes:
indexpandas.Index or pandas.MultiIndex

indices

columnspandas.Index

samples numbering

valuesnumpy.array

sample values as float

get_condition_number()

Return condition number of samples.

Notes

..note:: the condition number can help assess multicollinearity.

get_corr()
get_cov()
get_mean()
get_rstd()
get_std()
iterate_xs_samples()

Iterate samples one by one and shape them as a sandy.Xs() dataframe, but with mutligroup structure. This output should be passed to sandy.Xs._perturb(). The function is called by sandy.Endf6.apply_perturbations()

Yields:
nint

.

spd.DataFrame

dataframe of perturbation coefficients with:

  • columns: pd.MultiIndex with levels “MAT” and “MT”

  • index: pd.IntervalIndex with multigroup structure

Notes

If samples refer to redundant MT number, the same identical samples are passed one level down to the partial MT components. For instance:

  • MT=4 samples will be assigned to MT=50-91

  • MT=1 samples will be assigned to MT=2 and MT=3

  • MT=18 samples will be assigned to MT=19-21 and MT=38

..important:: Assigning samples from redundant MT number to partial

components only applies if the partial components do not have their own samples, and it only goes one level deep.

Examples

Get samples fot MT=1 >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 10010) >>> smps1 = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=1)))[33]

Copy samples each time to a redundant or partial MT >>> smps3 = sandy.Samples(smps1.data.reset_index().assign(MT=3).set_index([“MAT”, “MT”, “E”])) >>> smps18 = sandy.Samples(smps1.data.reset_index().assign(MT=18).set_index([“MAT”, “MT”, “E”])) >>> smps19 = sandy.Samples(smps1.data.reset_index().assign(MT=19).set_index([“MAT”, “MT”, “E”])) >>> smps27 = sandy.Samples(smps1.data.reset_index().assign(MT=27).set_index([“MAT”, “MT”, “E”])) >>> smps4 = sandy.Samples(smps1.data.reset_index().assign(MT=4).set_index([“MAT”, “MT”, “E”])) >>> smps51 = sandy.Samples(smps1.data.reset_index().assign(MT=51).set_index([“MAT”, “MT”, “E”])) >>> smps101 = sandy.Samples(smps1.data.reset_index().assign(MT=101).set_index([“MAT”, “MT”, “E”])) >>> smps452 = sandy.Samples(smps1.data.reset_index().assign(MT=452).set_index([“MAT”, “MT”, “E”]))

Check that samples are passed correctly to daughter MTs (only one level deep) >>> expected = pd.MultiIndex.from_product([[125], [51]], names=[“MAT”, “MT”]) >>> assert next(smps51.iterate_xs_samples())[1].columns.equals(expected)

>>> expected = pd.MultiIndex.from_product([[125], [4] + list(sandy.redundant_xs[4])], names=["MAT", "MT"])
>>> assert next(smps4.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [1] + list(sandy.redundant_xs[1])], names=["MAT", "MT"])
>>> assert next(smps1.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [3] + list(sandy.redundant_xs[3])], names=["MAT", "MT"])
>>> assert next(smps3.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [1] + list(sandy.redundant_xs[1])], names=["MAT", "MT"])
>>> assert next(smps1.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [18] + list(sandy.redundant_xs[18])], names=["MAT", "MT"])
>>> assert next(smps18.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [27] + list(sandy.redundant_xs[27])], names=["MAT", "MT"])
>>> assert next(smps27.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [101] + list(sandy.redundant_xs[101])], names=["MAT", "MT"])
>>> assert next(smps101.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [452] + list(sandy.redundant_xs[452])], names=["MAT", "MT"])
>>> assert next(smps452.iterate_xs_samples())[1].columns.equals(expected)

In this example the original covariance contains data for MT=1 and MT=51. >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 942400) >>> smps = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=[1, 51])))[33]

Then, since MT=1 is redundant, samples are passed to its partial components (MT=2 and MT=3). >>> expected = pd.MultiIndex.from_product([[9440], [1, 51] + list(sandy.redundant_xs[1])], names=[“MAT”, “MT”]) >>> assert next(smps.iterate_xs_samples())[1].columns.equals(expected)

If case one of the partial components already has samples, i.e., MT=2… >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 942400) >>> smps = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=[1, 2, 51])))[33]

Then the MT=1 samples are not passed to the partial components, which in this case it means that MT=2 is not changed and MT=3 is not created. >>> expected = pd.MultiIndex.from_product([[9440], [1, 2, 51]], names=[“MAT”, “MT”]) >>> assert next(smps.iterate_xs_samples())[1].columns.equals(expected)

test_shapiro(size=None, pdf='normal')

Perform the Shapiro-Wilk test for normality on the samples. The test can be performed also for a lognormal distribution by testing for normality the logarithm of the samples.

The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution.

Parameters:
sizeint, optional

number of samples (starting from the first) that need to be considered for the test. The default is None, i.e., all samples.

pdfstr, optional

the pdf used to test the samples. Either “normal” or “lognormal”. The default is “normal”.

Returns:
pd.DataFrame

Dataframe with Shapriro-Wilk results (statistic and pvalue) for each variable considered in the Samples() instance.

Examples

Generate 5000 xs samples normally, log-normally and uniform distributed >>> tape = sandy.get_endf6_file(“jeff_33”, “xs”, 10010) >>> njoy_kws = dict(err=1, errorr33_kws=dict(mt=102)) >>> nsmp = 5000 >>> seed = 5 >>> >>> smp_norm = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”normal”))[33] >>> smp_lognorm = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”lognormal”))[33] >>> smp_uniform = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”uniform”))[33]

In this example we defined the following arbitrary convergence criteria:
  • if the p value is larger than 0.05 we fail to reject the null-hypothesis and we accept the results

  • if the first condition is accepted, we confirm the pdf if the statistics is larger than 0.95

>>> threshold = 0.95
>>> pthreshold = 0.05
>>> def test(smps):
...     data = []
...     for n in [10, 50, 100, 500, 1000, 5000]:
...         for pdf in ("normal", "lognormal"):
...             df = smps.test_shapiro(pdf=pdf, size=n)
...             idx = df.statistic.idxmin()
...             w = df.loc[idx]
...             t = "reject" if w.pvalue < pthreshold else (pdf if w.statistic > threshold else "reject")
...             data.append({"PDF": pdf, "test":t, "# SMP": n})
...     df = pd.DataFrame(data).pivot_table(index="# SMP", columns="PDF", values="test", aggfunc=lambda x: ' '.join(x))
...     return df

The Shapiro-Wilks test proves wrong the normal samples because of the tail truncation. # >>> print(test(smp_norm)) PDF lognormal normal # SMP 10 reject reject 50 reject reject 100 reject reject 500 reject reject 1000 reject reject 5000 reject reject

The Shapiro-Wilks test proves right for the lognormal samples and the lognormal distribution. # >>> print(test(smp_lognorm)) PDF lognormal normal # SMP 10 lognormal reject 50 lognormal reject 100 lognormal reject 500 lognormal reject 1000 lognormal reject 5000 lognormal reject

The Shapiro-Wilks gives too low p-values for the uniform samples. # >>> print(test(smp_uniform)) PDF lognormal normal # SMP 10 reject reject 50 reject reject 100 reject reject 500 reject reject 1000 reject reject 5000 reject reject