LECA - The Liquid Electrolyte Composition Analysis Package
The Liquid Electrolyte Composition Analysis (LECA) package creates a simplified Jupyter-Notebook [1] based workflow for applying Scikit-Learn [2] machine learning regression models to predict liquid electrolyte behavior based on composition.
Requirements
Python 3.7+, Jupyter Notebook 6.4.11+
- With the following python libraries:
Scikit-learn 1.3.1+
NumPy 1.22.3+
Matplotlib 3.5.1+
Pandas 1.4.2+
SciPy 1.8.1+
Uncertainties 3.1.7+
MAPIE 0.5.6
HDBSCAN 0.8.28+
Seaborn 0.11.2+
GPyOpt 1.2.6+
Installation
The LECA package can be installed directly from source with:
pip install git+https://github.com/Harrison-Teeg/LECA.git
Envisaged LECA Work Flow
Data Import and Feature Engineering
Import Data from JSON or CSV files
Visualize dataset with feature overview and interactive data visualizer
Identify and filter outlier values using HDBSCAN [3]
Manually filter nonsense-values with user defined explicit boundaries
Combine repeated measurements and record statistical behavior (measurement noise)
Generate surrogate models for Arrhenius behavior or other user defined values
Initialize Regression Models and Compare Results
Data splitting / Scaling automatically handled
- Declare regression models to implement (supports N-dimensional feature/objective space)
Linear Regression (LR)
Gaussian Process Regression (GPR) (supports Isotropic/Anisotropic RBF, Matern, RQ, Custom kernel)
Neural Network (NN)
Random Forest (RF)
Hyperparameter Optimization for NN and RF with GPyOpt [4]
Customized Polynomial selection for LR [5]
Cross-validated scoring of models and visualization to provide simple overview of comparative model performance
Ensemble based uncertainty estimation for LR / NN / RF models using MAPIE [6]
Validate performance of models on unseen validation data
Analyze Objective Function for Compositions
Interactive widgets to visualize objective function and model uncertainty for various compositions
Return optimal composition to maximize/minimize objective function optimization
Ranked Batch Mode Active Learning module based on RBMAL approach of Cordoso et al. [7]
Areas of Further Development
Multi-Objective Optimization: Identifying Pareto-fronts for multiple-objectives for electrolyte composition (e.g. electrochemical stability, conductivity, etc.)
References
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation program under grants agreement No 957189 (BIG-MAP) and No 957213 (BATTERY2030+).