Last year I extended the 
CDK SMARTS implementation to match component groupings and stereochemistry. Specifying stereochemistry presents some interesting logical predicate that might be tricky to handle.
Here are some examples that I came up with for testing the correctness of query handling. They start simple before getting a little mischievous. First, recursion and component grouping.
| query | targets | nmatch | Comment | 
| Component grouping (fragment) | 
| (O).(O) | O=O | 0 | Example from Daylight | 
|  | OCCO | 0 | 
|  | O.CCO | 2 | 
| Component grouping (connected) | 
| (O.O) | O=O | 2 | Example from Daylight | 
|  | OCCO | 2 | 
|  | O.CCO | 0 | 
| Recursion, ad infinitum | 
| [$(CC[$(CCO),$(CCN)])] | CCCCO | 1 |  | 
|  | CCCCN | 1 | 
|  | CCCCC | 0 | 
| Recursive component grouping | 
| [O;D1;$(([a,A]).([A,a]))][CH]=O | OC=O.c1ccccc1 | 1 | Feature/Bug #1312 | 
|  | OC=O | 0 | 
These next ones are concerned with logic and stereochemistry.
| query | targets | nmatch | Comment | 
|
| Ensure local stereo matching | 
| *[@](*)(*)(*) | O[C@](N)(C)CC | 12 | tetrahedrons have 12 rotation symmetries | 
|  | O[C@@](N)(C)CC | 12 | 
|  | O[C](N)(C)CC | 0 | 
| Implicit (hydrogen or lone-pair) neighbour | 
| CC[S@](C)=O | CC[S@](C)=O | 1 | 
|  | CC[S@@](C)=O | 0 | 
|  | CC[S](C)=O | 0 | 
| Either (tetrahedral) | 
| CC[@,@@](C)O | CC[C@H](C)O | 1 | 
|  | CC[C@@H](C)O | 1 | 
|  | CCC(C)O | 0 | 
| Both (tetrahedral) | 
| CC[@&@@](C)O | CC[C@H](C)O | 0 | 
|  | CC[C@@H](C)O | 0 | 
|  | CCC(C)O | 0 | 
| Respect logical precedence 1 | 
| CC[@,Si@@](C)O | CC[C@H](C)O | 1 | 
|  | CC[C@@H](C)O | 0 | 
|  | CCC(C)=O | 0 | 
| Respect logical precedence 2 | 
| CC[C@,Si@@](C)O | CC[C@H](C)O | 1 | 
|  | CC[C@@H](C)O | 0 | 
|  | CCC(C)O | 0 | 
|  | CC[Si@H](C)O | 0 | 
|  | CC[Si@@H](C)O | 1 | 
|  | CC[Si](C)O | 0 | 
| Unspecified | 
| CC[@@?](C)O | CC[C@H](C)O | 0 | 
|  | CC[C@@H](C)O | 1 | 
|  | CCC(C)O | 1 | 
| Negation | 
| CC[!@](C)O | CC[C@H](C)O | 0 | !@@is also equivalent to@? | 
|  | CC[C@@H](C)O | 1 | 
|  | CCC(C)O | 1 | 
| Neither (tetrahedral) using 'or unspecified' | 
| CC[@?@@?](C)O | CC[C@H](C)O | 0 | 
|  | CC[C@@H](C)O | 0 | 
|  | CCC(C)O | 1 | 
| Neither (tetrahedral) using negation | 
| CC[!@!@@](C)O | CC[C@H](C)O | 0 | 
|  | CC[C@@H](C)O | 0 | 
|  | CCC(C)O | 1 | 
| Either (geomeric) | 
| C/C=C/,\C | C/C=C/C | 1 | 
|  | C/C=C\C | 1 | 
|  | CC=CC | 0 | 
| Neither (geomeric) | 
| C/C=C!/!\C | C/C=C/C | 0 | 
|  | C/C=C\C | 0 | 
|  | CC=CC | 1 | 
The last two are quite tricky (and not currently implemented) but once the atom-centric handling is correct it's a simple reduction. It's quite fun to work out so i'll 
leaf that up to the reader.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.