LECA.prep.arrhenius

LECA.prep.arrhenius(data: DataFrame, feature_list: str | List[str], objective: str = 'conductivity', inverse_temp: str = 'inverse temperature', min_samples: int = 5, beta_0: float | None = None, n_fits: int = 50, random_state: int | None = None, save_loc: str | bool = False) Tuple[float, List[str], List[str], DataFrame]

Transform objective function into Arrhenius fitted surrogate model (\(log(\sigma) \rightarrow S_0, S_1, S_2\)). This function expects the objective function along with standard deviations in the form of a DataFrame with columns: e.g. “inverse temperature”, “X”, “conductivity”, “conductivity_std”, where “X” can be some arbitrary feature set. This transformation groups all data with identical “X” features into a single datapoint with the objective functions \(S_0, S_1, S_2\), and makes the inverse temperature implicit.

For each unique composition (read: identical “X” values):
  • the data should contain a single row with the average value and deviations of repeated measurements of the objective function at each inverse temperature.

  • This function randomly perturbs the average measured value for that composition for each inverse temperature in the form \(log(perturbed) = log(objective) \cdot (1 + random\_normal(std))\) and applies the Arrhenius fit.

  • This is repeated for n_fits (by default 50 times). The resulting average coefficient values, their standard deviations and metrics for the quality of the fit on all measurements with this composition are returned.

The Arrhenius surrogate model has the form:

\[log_{10}(objective) = S_0 - S_1 (\beta - \beta_0) - S_2 (\beta - \beta_0)^2\]

\(\beta\) values should have 1000/T[K] scale, and \(\beta_0\) can be freely chosen (if not user defined, the function searches for the \(\beta_0\) value between 1000/(273.15-50) and 1000/(273.15+100) which results in minimal correlation between the \(S_0\) and \(S_1\) coefficients. The returned DataFrame includes the surrogate \(S_{0,1,2}\) objective functions as well as their deviations (S0_std, etc.). The results DataFrame also includes metrics for the quality of the Arrhenius fits: ‘R2’, Mean Absolute (Relative) Error’, ‘Mean Squared (Relative) Error’, ‘log(MAE arrh fit)’

If no \(\beta_0\) value is provided, by default a plot of the \(S_0 : S_1\) correlation from -50 - 100C is generated.

In addition, an overview of the Arrhenius fit quality is provided by displaying the compositions with the top 10% log(MAE) as well as a histogram overview of the log(MAE) for the whole dataset.

Disambiguation: This function should be used to process a DataFrame with combined rows (-> mean value, deviations) for repeated measurement (i.e. 1 row for 5 measurements of the conductivity of an electrolyte with composition “X” and inverse temperature \(\beta\) -> “conductivity”, “conductivity_std”)

Parameters:
  • data (DataFrame) – Dataframe of experimental measurements.

  • feature_list (Untion[str, List[str]]) – Feature or list of features

  • objective (str) –

    data column label of objective function to use for the Arrhenius fits (only supports single objective function).

    Default value conductivity.

  • inverse_temp (str) –

    data column label of the inverse temperature (in 1000/T[K] scale) for the measured values. The data DataFrame should be properly prepared to have this information before calling this function.

    Default value inverse temperature.

  • min_samples (int) –

    Minimum number of measurements for the feature set excluding inverse temperature (i.e. how many different temperatures were measured for a given formulation). Compositions below this threshold are discarded.

    Default value 5.

  • beta_0 (Optional[float]) –

    This value corresponds to the temperature where S_0 and S_1 become uncorrelated. If no value is provided, the function will automatically search for the lowest correlated temperature from -50C to 100C (in intervals of 5C).

    Default value None.

  • n_fits (int) –

    Number of times to vary the objective function (+- random normal perturbation based on the standard deviations of the measurement) and refit to estimate coefficient uncertainty.

    Default value 50

  • random_state (Optional[int]) –

    Sets a numpy random seed for reproducibility.

    Default value None.

  • save_loc (Union[bool, str]) –

    Name to save plot (if desired), if False the plot will only be shown, not saved.

    Saving filename convention is: save_loc + ‘onset_temp_plot.pdf’

Returns:

4 tuple of:

  • float (beta_0),

  • List[str] (feature list),

  • List[str] (S0 S1 S2 objective functions list),

  • DataFrame with added S0, S0_std, S1, S1_std, S2, S2_std, and ‘R2’, ‘Mean Absolute (Relative) Error’, ‘Mean Squared (Relative) Error’, ‘log(MAE arrh fit), ‘Activation Energy’’ columns

Return type:

Tuple[float, List[str], List[str], pd.DataFrame]