SBMR Model for something something
Date:
A simple and interpretable matrix-based representation is presented for predicting molecular properties, specifically HOMO-LUMO energy gaps, of functionalized organic molecules using a Convolutional Neural Network. Each molecule is encoded as a sparse binary matrix that captures the identity and position of substituents on a fixed molecular backbone. The model was benchmarked across four molecular families: n-butane, i-butane, cyclobutadiene, and quinone, achieving a combined RMSE of 4.4 kcal mol−1 compared to DFT-computed references, with over 80% of predictions falling within ±5% error. To contextualize this performance, the model was benchmarked against three established featurization methods (the Coulomb Matrix, the Smooth Overlap of Atomic Positions, and a Message-Passing Neural Network) trained on DFT-optimized geometries. The SBMR-CNN model demonstrates highly competitive accuracy, significantly outperforming the Coulomb Matrix benchmark and approaching the performance of the more computationally intensive SOAP and MPNN descriptors, all while offering distinct advantages in its dramatically smaller feature space and less stringent input data requirements. The resulting model provides both high accuracy and direct interpretability, qualities that are particularly valuable to experimental and synthetic chemists. Compared to more abstract molecular representations, the sparse binary matrix approach offers a transparent and customizable framework for property prediction, with potential applications in molecular screening and rational design.
