Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance
AbstractIdentification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called ‘scaffold hopping’. Small‐, medium‐, and large‐step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large‐step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real‐life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large‐step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand‐based methods. We also showed that a machine‐learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.