欢迎访问过程工程学报, 今天是

过程工程学报 ›› 2024, Vol. 24 ›› Issue (7): 833-842.DOI: 10.12034/j.issn.1009-606X.223308

• 研究论文 • 上一篇    下一篇

基于卷积神经网络的偶联反应催化剂及速率常数预测方法

杨婷, 董亚超*, 都健   

  1. 大连理工大学化工学院,辽宁 大连 116086
  • 收稿日期:2023-11-13 修回日期:2024-01-04 出版日期:2024-07-28 发布日期:2024-07-24
  • 通讯作者: 董亚超 yachaodong@dlut.edu.cn
  • 基金资助:
    中国国家自然科学基金面上项目;中央高校基本科研项目

Catalyst and reaction rate constant prediction methods of coupling reaction based on convolutional neural network

Ting YANG,  Yachao DONG*,  Jian DU   

  1. School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116086, China
  • Received:2023-11-13 Revised:2024-01-04 Online:2024-07-28 Published:2024-07-24
  • Contact: Dong YACHAO yachaodong@dlut.edu.cn

摘要: 交叉偶联反应是现代有机合成中碳-碳键生成最有效的方法之一,有效地对催化剂进行筛选、优化对于提高药物、精细化学品的研发效率有重要作用。针对Suzuki-Miyaura及Buchwald-Hartwig交叉偶联反应建立了基于有机反应数据库的卷积神经网络模型及相关方法,用于适宜反应的催化剂(含配体)预测和速率常数预测,同时基于随机森林算法建立对比模型。结果表明,基于卷积神经网络的催化剂预测模型在Suzuki-Miyaura交叉偶联反应数据集中前三准确率达85%,在Buchwald-Hartwig交叉偶联反应数据集中前三准确率达92%,能够正确推荐反应催化剂。获得模型推荐的催化剂后,基于催化剂的结构特征使用ECFP4分子指纹及K-Means算法对反应进行聚类分析,在此基础上进行反应速率常数预测。将催化剂文本生成随机数字标签,与反应物、产物的ECFP4分子指纹进行拼接,形成描述整个反应的反应指纹作为模型的输入。为划分为3个聚类的数据集与原数据集分别建立速率常数预测模型并进行对比。结果表明,在两类交叉偶联反应数据集上使用聚类方法的速率常数预测模型性能有显著提高。基于卷积神经网络的交叉偶联反应催化剂及速率常数预测方法有望应用于其他有机合成反应,并进一步将形成的模型用于反应条件控制及优化。

关键词: 交叉偶联反应, 催化剂及配体预测, 反应速率常数预测, 卷积神经网络

Abstract: Cross-coupling reactions are one of the most effective methods of forming carbon-carbon bonds in modern organic synthesis. Effective screening and optimization of reaction conditions, such as catalysts, play an important role in improving the efficiency of drug and fine chemical development. In this work, the convolutional neural network models and methods based on an organic reaction database are developed for Suzuki-Miyaura and Buchwald-Hartwig cross-coupling reactions to predict suitable reaction catalysts (with ligands) and rate constants. A comparative model is also established based on the random forest algorithm. The results show that the catalyst prediction model based on the convolutional neural network can accurately recommend reaction catalysts with 85% of top 3 accuracy in the Suzuki-Miyaura cross-coupling reaction dataset, and 92% of top 3 accuracy in the Buchwald-Hartwig cross-coupling reaction dataset. After obtaining the catalyst recommended by the model, the ECFP4 molecular fingerprint and K-Means algorithm are used to cluster the reaction based on the structural characteristics of the catalyst, and on this basis the reaction rate constant is predicted. In order to create a reaction fingerprint that describes the entire reaction, the random number labels are generated from the catalyst text and then concatenated with the ECFP4 molecular fingerprint of the reactants and products. Rate constant prediction models are established based on the datasets and compared respectively. The results show that the performance of the rate constant prediction model using the clustering method is significantly improved on the two types of cross-coupling reaction datasets, which indicates that the reaction clustering method based on the structural characteristics of catalyst has a significant improvement in predicting the rate constant of the cross-coupling reaction. This cross-coupling reaction catalyst and rate constant prediction methods based on the convolutional neural network are expected to be applied to other organic synthesis reactions and further use the formed model for reaction condition control and optimization.

Key words: Cross-coupling reaction, Catalyst and ligand prediction, Prediction of reaction rate constant, Convolutional neural network