First we summarize the results of the graph theory (archetype) analysis and then the atomic property (instance) analysis. Finally, we discuss the relationship between the two kinds of analysis. From the graph theory analysis, there are 1179 different frameworks among the 5120 compounds analyzed. Of these frameworks, 783 (66%) are unique, i.e.,they are found in only one drug molecule. Chart 1 shows graph frameworks for compounds in the CMC database as classified by connectivity triangles. We have shown only frameworks that exist in at least 20 drugs. This set of 32 frameworks accounts for 50% of the 5120 total drug molecules. Clearly the six-ring is the most commonly used framework for these drugs. Acyclic molecules (those with no framework) account for 306 (6%) of the molecules we examined.
Our second method of analysis uses topological torsions11 for classification. Several atom properties (atom
type, hybridization, and bond order) are considered. Somewhat more diversity is seen; there are 2506 different frameworks among the 5120 compounds in the database. Again, a large majority of these frameworks (1908, or 76%) are unique. Chart 2 shows atomic property-based drug frameworks (drug instances) that occur in the CMC at least 10 times. Naturally, because this classification scheme accounts for hybridization and bond order, one would expect a more diverse set of frameworks to be required to represent the drug database. Even so, this set of 41 frameworks accounts for 1235 (24%) of the 5120 molecules we examined. Clearly benzene is the most commonly used framework for these drugs.