`LECA.prep`.direct_sample_arrhenius

LECA.prep.direct_sample_arrhenius(data: DataFrame, feature_list: str | List[str], objective: str = 'conductivity', max_error: float | None = None, inverse_temp: str = 'inverse temperature', min_samples: int = 5, beta_0: float | None = None, n_fits: int = 50, random_state: int | None = None, save_loc: str | bool = False, return_corr_plot_data: bool | None = False) → Tuple[float, List[str], List[str], DataFrame]

Transform objective function into Arrhenius fitted surrogate model (\(log(\sigma) \rightarrow S_0, S_1, S_2\)). This function expects repeated measurements of the objective function in the form of a DataFrame with columns: e.g. “inverse temperature”, “X”, “conductivity”, where `”X” can be some arbitrary feature set.

For each unique composition (read: identical “X” values):

the data should contain several repeated measurements of the objective function at varying inverse temperatures.
This function randomly selects a single measurement for that composition for each inverse temperature and applies the Arrhenius fit.
This is repeated for n_fits (by default 50 times). The resulting average coefficient values, their standard deviations and metrics for the quality of the fit on all measurements with this composition are returned.
In addition, the activation energy for the sample (\(E = (S_1 \cdot R))\) [mJ/mol], where R is the Gas Constant) is included.

The Arrhenius surrogate model has the form:

\[log_{10}(objective) = S_0 - S_1 (\beta - \beta_0) - S_2 (\beta - \beta_0)^2\]

\(\beta\) values should have 1000/T[K] scale, and \(\beta_0\) can be freely chosen (if not user defined, the function searches for the \(\beta_0\) value between 1000/(273.15-50) and 1000/(273.15+100) which results in minimal correlation between the \(S_0\) and \(S_1\) coefficients. The returned DataFrame includes the surrogate \(S_{0,1,2}\) objective functions as well as their deviations (S0_std, etc.). The results DataFrame also includes metrics for the quality of the Arrhenius fits: ‘Mean Absolute (Relative) Error’, ‘Mean Squared (Relative) Error’, ‘log(MAE arrh fit)’

If no \(\beta_0\) value is provided, by default a plot of the \(S_0 : S_1\) correlation from -50 - 100C is generated.

In addition, an overview of the Arrhenius fit quality is provided by displaying the compositions with the top 10% log(MAE) as well as a histogram overview of the log(MAE) for the whole dataset.

Disambiguation: This function should be used to process a DataFrame with individual rows for each repeated measurement (i.e. 5 rows for 5 measurements of the conductivity of an electrolyte with composition “X” and inverse temperature \(\beta\))

Parameters:

data (DataFrame) – Dataframe of experimental measurements.
feature_list (Untion[str, List[str]]) – Feature or list of features
objective (str) –
data column label of objective function to use for the Arrhenius fits (only supports single objective function).

Default value conductivity.
inverse_temp (str) –
data column label of the inverse temperature (in 1000/T[K] scale) for the measured values. The data DataFrame should be properly prepared to have this information before calling this function.

Default value inverse temperature.
min_samples (int) –
Minimum number of measurements for the feature set excluding inverse temperature (i.e. how many different temperatures were measured for a given formulation). Compositions below this threshold are discarded.

Default value 5.
beta_0 (Optional[float]) –
This value corresponds to the temperature where S_0 and S_1 become uncorrelated. If no value is provided, the function will automatically search for the lowest correlated temperature from -50C to 100C (in intervals of 5C).

Default value None.
n_fits (int) –
Number of times to vary the objective function (+- random normal perturbation based on the standard deviations of the measurement) and refit to estimate coefficient uncertainty.

Default value 50
random_state (Optional[int]) –
Sets a numpy random seed for reproducibility.

Default value None.
save_loc (Union[bool, str]) –
Name to save plot (if desired), if False the plot will only be shown, not saved.

Saving filename convention is: save_loc + ‘onset_temp_plot.pdf’

Returns:

4 tuple of:

float (beta_0),
List[str] (feature list),
List[str] (S0 S1 S2 objective functions list),
DataFrame with added S0, S0_std, S1, S1_std, S2, S2_std, and ‘R2’, ‘Mean Absolute (Relative) Error’, ‘Mean Squared (Relative) Error’, ‘log(MAE arrh fit), ‘Activation Energy’’ columns

Return type:

Tuple[float, List[str], List[str], pd.DataFrame]

LECA.prep.direct_sample_arrhenius

`LECA.prep`.direct_sample_arrhenius