LECA.prep.direct_sample_arrhenius

LECA.prep.direct_sample_arrhenius(data: DataFrame, feature_list: str | List[str], objective: str = 'conductivity', max_error: float | None = None, inverse_temp: str = 'inverse temperature', min_samples: int = 5, beta_0: float | None = None, n_fits: int = 50, random_state: int | None = None, save_loc: str | bool = False) Tuple[float, List[str], List[str], DataFrame]

Transform objective function into Arrhenius fitted surrogate model (\(log(\sigma) \rightarrow S_0, S_1, S_2\)). This function expects repeated measurements of the objective function in the form of a DataFrame with columns: e.g. “inverse temperature”, “X”, “conductivity”, where `”X” can be some arbitrary feature set.

For each unique composition (read: identical “X” values):
  • the data should contain several repeated measurements of the objective function at varying inverse temperatures.

  • This function randomly selects a single measurement for that composition for each inverse temperature and applies the Arrhenius fit.

  • This is repeated for n_fits (by default 50 times). The resulting average coefficient values, their standard deviations and metrics for the quality of the fit on all measurements with this composition are returned.

  • In addition, the activation energy for the sample (\(E = (S_1 \cdot R))\) [mJ/mol], where R is the Gas Constant) is included.

The Arrhenius surrogate model has the form:

\[log_{10}(objective) = S_0 - S_1 (\beta - \beta_0) - S_2 (\beta - \beta_0)^2\]

\(\beta\) values should have 1000/T[K] scale, and \(\beta_0\) can be freely chosen (if not user defined, the function searches for the \(\beta_0\) value between 1000/(273.15-50) and 1000/(273.15+100) which results in minimal correlation between the \(S_0\) and \(S_1\) coefficients. The returned DataFrame includes the surrogate \(S_{0,1,2}\) objective functions as well as their deviations (S0_std, etc.). The results DataFrame also includes metrics for the quality of the Arrhenius fits: ‘Mean Absolute (Relative) Error’, ‘Mean Squared (Relative) Error’, ‘log(MAE arrh fit)’

If no \(\beta_0\) value is provided, by default a plot of the \(S_0 : S_1\) correlation from -50 - 100C is generated.

In addition, an overview of the Arrhenius fit quality is provided by displaying the compositions with the top 10% log(MAE) as well as a histogram overview of the log(MAE) for the whole dataset.

Disambiguation: This function should be used to process a DataFrame with individual rows for each repeated measurement (i.e. 5 rows for 5 measurements of the conductivity of an electrolyte with composition “X” and inverse temperature \(\beta\))

Parameters:
  • data (DataFrame) – Dataframe of experimental measurements.

  • feature_list (Untion[str, List[str]]) – Feature or list of features

  • objective (str) –

    data column label of objective function to use for the Arrhenius fits (only supports single objective function).

    Default value conductivity.

  • inverse_temp (str) –

    data column label of the inverse temperature (in 1000/T[K] scale) for the measured values. The data DataFrame should be properly prepared to have this information before calling this function.

    Default value inverse temperature.

  • min_samples (int) –

    Minimum number of measurements for the feature set excluding inverse temperature (i.e. how many different temperatures were measured for a given formulation). Compositions below this threshold are discarded.

    Default value 5.

  • beta_0 (Optional[float]) –

    This value corresponds to the temperature where S_0 and S_1 become uncorrelated. If no value is provided, the function will automatically search for the lowest correlated temperature from -50C to 100C (in intervals of 5C).

    Default value None.

  • n_fits (int) –

    Number of times to vary the objective function (+- random normal perturbation based on the standard deviations of the measurement) and refit to estimate coefficient uncertainty.

    Default value 50

  • random_state (Optional[int]) –

    Sets a numpy random seed for reproducibility.

    Default value None.

  • save_loc (Union[bool, str]) –

    Name to save plot (if desired), if False the plot will only be shown, not saved.

    Saving filename convention is: save_loc + ‘onset_temp_plot.pdf’

Returns:

4 tuple of:

  • float (beta_0),

  • List[str] (feature list),

  • List[str] (S0 S1 S2 objective functions list),

  • DataFrame with added S0, S0_std, S1, S1_std, S2, S2_std, and ‘R2’, ‘Mean Absolute (Relative) Error’, ‘Mean Squared (Relative) Error’, ‘log(MAE arrh fit), ‘Activation Energy’’ columns

Return type:

Tuple[float, List[str], List[str], pd.DataFrame]