New Publicly Available Chemical Query Language, CSRML, to Support Chemotype Representations for Application to Data Mining and Modeling
Chemotypes, a new representation method for chemical substructures, molecules, reaction rules, and reactions, have been developed. This new approach overcomes the limitations of current representation methods for substructures (e.g., SMARTS) or reaction transformations (e.g., SMIRKS, reaction SMILES). Chemotypes are expressed in an XML-based language and can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which represent the chemical space relevant to various toxicity endpoints. A software application, ChemoTyper has also been developed and made publicly available in order to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in chemotypes (CSRML language) are publicly available to facilitate and encourage the exchange of structural knowledge.
YANG Chihae;
TARKHOV Aleksey;
MARUSCZYK Jörg;
BIENFAIT Bruno;
GASTEIGER J;
KLEINOEDER Thomas;
MAGDZIARZ Tomasz;
SACHER Oliver;
SCHWAB C;
SCHWOEBEL Johannes;
TERFLOTH Lothar;
ARVIDSON K;
RICHARD Ann;
WORTH Andrew;
RATHMAN James;
2015-04-09
AMER CHEMICAL SOC
JRC92489
1549-9596,
http://pubs.acs.org/doi/abs/10.1021/ci500667v,
https://publications.jrc.ec.europa.eu/repository/handle/JRC92489,
10.1021/ci500667v,
Additional supporting files
| File name | Description | File type | |