I just published a paper work entitled “Statistical Learning for Accurate and Interpretable Battery Lifetime Prediction” with Kristen Severson and Jeremy Witmer. This work builds on my class project for CS229, where we explored concepts building on the work of Severson et al., and answers some of the remaining questions I had about our original paper after it was published.

Some of the ideas and results presented in this work:

  1. “Capacity matrices” (Figure 1) are a compact, easy-to-visualize, machine-learning-ready format for storing battery cycling data
  2. Downsampling (Figure 2) can make battery cycling datasets much more compact (could be useful for very large datasets)
  3. Univariate models (Figure 3) can perform very well on this dataset; furthermore, the interquartile range statistic slightly outperforms variance (gasp!)
  4. Simple multivariate models (Figures 56) perform pretty well and reveal some interesting trends into the voltage curves (I’m confident they mean something, but not sure quite what yet)
  5. Deep learning models perform similarly to the simpler statistical learning models (Figure 7), despite the fact that the vast majority of follow-up papers to Severson et al. uses deep learning

At the very least, I hope this paper inspires others to consider simple models before jumping to deep learning. Starting simple, after all, is good scientific practice. However, if deep learning models are used, training a statistical learning model in parallel is worthwhile to benchmark the performance of the complex model.

Unfortunately, the header formatting got messed up in the proof stage (many headers got demoted to subheaders); hopefully the intended logic is easy to follow.

Lastly, many students have contacted me about the slow response time to get an academic license for the code in Severson et al. While I’m unfortunately unable to do much to address that issue, this paper does not face this licensing limitation. The data and code used to generate the figures is available here; this notebook contains nearly all of the code used to generate the figures. The paper is also open-access.