Wednesday 26 March 2014

Mischievous SMARTS Queries

Last year I extended the CDK SMARTS implementation to match component groupings and stereochemistry. Specifying stereochemistry presents some interesting logical predicate that might be tricky to handle.

Here are some examples that I came up with for testing the correctness of query handling. They start simple before getting a little mischievous. First, recursion and component grouping.

querytargetsnmatchComment
Component grouping (fragment)
(O).(O)O=O0Example from Daylight
OCCO0
O.CCO2
Component grouping (connected)
(O.O)O=O2Example from Daylight
OCCO2
O.CCO0
Recursion, ad infinitum
[$(CC[$(CCO),$(CCN)])]CCCCO1
CCCCN1
CCCCC0
Recursive component grouping
[O;D1;$(([a,A]).([A,a]))][CH]=OOC=O.c1ccccc11Feature/Bug #1312
OC=O0

These next ones are concerned with logic and stereochemistry.
querytargetsnmatchComment
Ensure local stereo matching
*[@](*)(*)(*)O[C@](N)(C)CC12tetrahedrons have 12 rotation symmetries
O[C@@](N)(C)CC12
O[C](N)(C)CC0
Implicit (hydrogen or lone-pair) neighbour
CC[S@](C)=OCC[S@](C)=O1
CC[S@@](C)=O0
CC[S](C)=O0
Either (tetrahedral)
CC[@,@@](C)OCC[C@H](C)O1
CC[C@@H](C)O1
CCC(C)O0
Both (tetrahedral)
CC[@&@@](C)OCC[C@H](C)O0
CC[C@@H](C)O0
CCC(C)O0
Respect logical precedence 1
CC[@,Si@@](C)OCC[C@H](C)O1
CC[C@@H](C)O0
CCC(C)=O0
Respect logical precedence 2
CC[C@,Si@@](C)OCC[C@H](C)O1
CC[C@@H](C)O0
CCC(C)O0
CC[Si@H](C)O0
CC[Si@@H](C)O1
CC[Si](C)O0
Unspecified
CC[@@?](C)OCC[C@H](C)O0
CC[C@@H](C)O1
CCC(C)O1
Negation
CC[!@](C)OCC[C@H](C)O0!@@ is also equivalent to @?
CC[C@@H](C)O1
CCC(C)O1
Neither (tetrahedral) using 'or unspecified'
CC[@?@@?](C)OCC[C@H](C)O0
CC[C@@H](C)O0
CCC(C)O1
Neither (tetrahedral) using negation
CC[!@!@@](C)OCC[C@H](C)O0
CC[C@@H](C)O0
CCC(C)O1
Either (geomeric)
C/C=C/,\CC/C=C/C1
C/C=C\C1
CC=CC0
Neither (geomeric)
C/C=C!/!\CC/C=C/C0
C/C=C\C0
CC=CC1
The last two are quite tricky (and not currently implemented) but once the atom-centric handling is correct it's a simple reduction. It's quite fun to work out so i'll leaf that up to the reader.