Friday 13 April 2018

RDKit Reaction SMARTS

Update 19/09/22


I believe I was wrong and Daylight did allow SMIRKS to be run backwards and that the "direction" parameter controls it ("personal communication" from Roger).. The key insight is the documentation notes SMARTS patterns should only appear on atoms that don't have bond changes around them. From the man page:

"Atomic expressions may be any valid atomic SMARTS expression for nodes where the bonding (connectivity & bond order) doesn't change.1 Otherwise, the atomic expressions must be valid SMILES."

Whether this make sense or not for writing portable SMIRKS is questionable.

This is supplementary to my original grumble on ambiguous naming. I still believe RDKit should just call it SMIRKS or at least something which didn't already mean something else (e.g. "RdSmirks"). I do agree there is a lot of overlap between SMARTS and SMIRKS but conflating these terms is problematic. Shortly after I originally wrote this original post, SMIRKS Native Open Force Field (SMIRNOFF) was published. In that work SMARTS is used, but they called it SMIRKS. There is some wiggling attempted by the authors that they use "SMIRKS features" of atom maps. However it is incorrect that these were a SMIRKS feature as in what Daylight called Reaction SMARTSsee the table "Examples of Reaction SMARTS" (SMARTS Theory Page). Of course SMIRNOFF was just too good a pun to change, but it again creates more confusion.

I recently ran some comparisons/benchmarks on different SMIRKS implementations, everyone's semantics are consistently inconsistent and the community would benefit by an effort to standardise. Perhaps then I can convince RDKit that they really do have a (good) SMIRKS implementation and it would be less confusing to just call it that.