sandy.core.samples module
- class sandy.core.samples.Samples(df, *args, **kwargs)
Bases:
object
Container for samples.
- Attributes:
data
Dataframe of samples.
Methods
Return condition number of samples.
Iterate samples one by one and shape them as a
sandy.Xs()
dataframe, but with mutligroup structure.test_shapiro
([size, pdf])Perform the Shapiro-Wilk test for normality on the samples.
get_corr
Return correlation matrix of samples.
get_cov
Return covariance matrix of samples.
get_mean
Return mean vector of samples.
get_std
Return standard deviation vector of samples.
get_rstd
Return relative standard deviation vector of samples.
- property data
Dataframe of samples.
- Returns:
- pandas.DataFrame
tabulated samples
- Attributes:
- indexpandas.Index or pandas.MultiIndex
indices
- columnspandas.Index
samples numbering
- valuesnumpy.array
sample values as float
- get_condition_number()
Return condition number of samples.
Notes
..note:: the condition number can help assess multicollinearity.
- get_corr()
- get_cov()
- get_mean()
- get_rstd()
- get_std()
- iterate_xs_samples()
Iterate samples one by one and shape them as a
sandy.Xs()
dataframe, but with mutligroup structure. This output should be passed tosandy.Xs._perturb()
. The function is called bysandy.Endf6.apply_perturbations()
- Yields:
- nint
.
- spd.DataFrame
dataframe of perturbation coefficients with:
columns: pd.MultiIndex with levels “MAT” and “MT”
index: pd.IntervalIndex with multigroup structure
Notes
If samples refer to redundant MT number, the same identical samples are passed one level down to the partial MT components. For instance:
MT=4 samples will be assigned to MT=50-91
MT=1 samples will be assigned to MT=2 and MT=3
MT=18 samples will be assigned to MT=19-21 and MT=38
- ..important:: Assigning samples from redundant MT number to partial
components only applies if the partial components do not have their own samples, and it only goes one level deep.
Examples
Get samples fot MT=1 >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 10010) >>> smps1 = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=1)))[33]
Copy samples each time to a redundant or partial MT >>> smps3 = sandy.Samples(smps1.data.reset_index().assign(MT=3).set_index([“MAT”, “MT”, “E”])) >>> smps18 = sandy.Samples(smps1.data.reset_index().assign(MT=18).set_index([“MAT”, “MT”, “E”])) >>> smps19 = sandy.Samples(smps1.data.reset_index().assign(MT=19).set_index([“MAT”, “MT”, “E”])) >>> smps27 = sandy.Samples(smps1.data.reset_index().assign(MT=27).set_index([“MAT”, “MT”, “E”])) >>> smps4 = sandy.Samples(smps1.data.reset_index().assign(MT=4).set_index([“MAT”, “MT”, “E”])) >>> smps51 = sandy.Samples(smps1.data.reset_index().assign(MT=51).set_index([“MAT”, “MT”, “E”])) >>> smps101 = sandy.Samples(smps1.data.reset_index().assign(MT=101).set_index([“MAT”, “MT”, “E”])) >>> smps452 = sandy.Samples(smps1.data.reset_index().assign(MT=452).set_index([“MAT”, “MT”, “E”]))
Check that samples are passed correctly to daughter MTs (only one level deep) >>> expected = pd.MultiIndex.from_product([[125], [51]], names=[“MAT”, “MT”]) >>> assert next(smps51.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [4] + list(sandy.redundant_xs[4])], names=["MAT", "MT"]) >>> assert next(smps4.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [1] + list(sandy.redundant_xs[1])], names=["MAT", "MT"]) >>> assert next(smps1.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [3] + list(sandy.redundant_xs[3])], names=["MAT", "MT"]) >>> assert next(smps3.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [1] + list(sandy.redundant_xs[1])], names=["MAT", "MT"]) >>> assert next(smps1.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [18] + list(sandy.redundant_xs[18])], names=["MAT", "MT"]) >>> assert next(smps18.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [27] + list(sandy.redundant_xs[27])], names=["MAT", "MT"]) >>> assert next(smps27.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [101] + list(sandy.redundant_xs[101])], names=["MAT", "MT"]) >>> assert next(smps101.iterate_xs_samples())[1].columns.equals(expected)
>>> expected = pd.MultiIndex.from_product([[125], [452] + list(sandy.redundant_xs[452])], names=["MAT", "MT"]) >>> assert next(smps452.iterate_xs_samples())[1].columns.equals(expected)
In this example the original covariance contains data for MT=1 and MT=51. >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 942400) >>> smps = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=[1, 51])))[33]
Then, since MT=1 is redundant, samples are passed to its partial components (MT=2 and MT=3). >>> expected = pd.MultiIndex.from_product([[9440], [1, 51] + list(sandy.redundant_xs[1])], names=[“MAT”, “MT”]) >>> assert next(smps.iterate_xs_samples())[1].columns.equals(expected)
If case one of the partial components already has samples, i.e., MT=2… >>> endf6 = sandy.get_endf6_file(‘jeff_33’, ‘xs’, 942400) >>> smps = endf6.get_perturbations(1, njoy_kws=dict(err=1, chi=False, mubar=False, nubar=False, errorr33_kws=dict(mt=[1, 2, 51])))[33]
Then the MT=1 samples are not passed to the partial components, which in this case it means that MT=2 is not changed and MT=3 is not created. >>> expected = pd.MultiIndex.from_product([[9440], [1, 2, 51]], names=[“MAT”, “MT”]) >>> assert next(smps.iterate_xs_samples())[1].columns.equals(expected)
- test_shapiro(size=None, pdf='normal')
Perform the Shapiro-Wilk test for normality on the samples. The test can be performed also for a lognormal distribution by testing for normality the logarithm of the samples.
The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution.
- Parameters:
- sizeint, optional
number of samples (starting from the first) that need to be considered for the test. The default is None, i.e., all samples.
- pdfstr, optional
the pdf used to test the samples. Either “normal” or “lognormal”. The default is “normal”.
- Returns:
- pd.DataFrame
Dataframe with Shapriro-Wilk results (statistic and pvalue) for each variable considered in the
Samples()
instance.
Examples
Generate 5000 xs samples normally, log-normally and uniform distributed >>> tape = sandy.get_endf6_file(“jeff_33”, “xs”, 10010) >>> njoy_kws = dict(err=1, errorr33_kws=dict(mt=102)) >>> nsmp = 5000 >>> seed = 5 >>> >>> smp_norm = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”normal”))[33] >>> smp_lognorm = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”lognormal”))[33] >>> smp_uniform = tape.get_perturbations(nsmp, njoy_kws=njoy_kws, smp_kws=dict(seed33=seed, pdf=”uniform”))[33]
- In this example we defined the following arbitrary convergence criteria:
if the p value is larger than 0.05 we fail to reject the null-hypothesis and we accept the results
if the first condition is accepted, we confirm the pdf if the statistics is larger than 0.95
>>> threshold = 0.95 >>> pthreshold = 0.05 >>> def test(smps): ... data = [] ... for n in [10, 50, 100, 500, 1000, 5000]: ... for pdf in ("normal", "lognormal"): ... df = smps.test_shapiro(pdf=pdf, size=n) ... idx = df.statistic.idxmin() ... w = df.loc[idx] ... t = "reject" if w.pvalue < pthreshold else (pdf if w.statistic > threshold else "reject") ... data.append({"PDF": pdf, "test":t, "# SMP": n}) ... df = pd.DataFrame(data).pivot_table(index="# SMP", columns="PDF", values="test", aggfunc=lambda x: ' '.join(x)) ... return df
The Shapiro-Wilks test proves wrong the normal samples because of the tail truncation. # >>> print(test(smp_norm)) PDF lognormal normal # SMP 10 reject reject 50 reject reject 100 reject reject 500 reject reject 1000 reject reject 5000 reject reject
The Shapiro-Wilks test proves right for the lognormal samples and the lognormal distribution. # >>> print(test(smp_lognorm)) PDF lognormal normal # SMP 10 lognormal reject 50 lognormal reject 100 lognormal reject 500 lognormal reject 1000 lognormal reject 5000 lognormal reject
The Shapiro-Wilks gives too low p-values for the uniform samples. # >>> print(test(smp_uniform)) PDF lognormal normal # SMP 10 reject reject 50 reject reject 100 reject reject 500 reject reject 1000 reject reject 5000 reject reject