Sunday 9 August 2015

MMFF Partial Charges Improvements in CDK

Some time last year Mark Williamson brought to my attention discrepancies in CDK's MMFF partial charge calculation. Investigating further it seemed to mainly be a problem with atom typing. There were two existing classes that could assigned MMFF atom types using a combination of a decision tree and string matching hose codes. The 761 molecules from the MMFF94 Validation Suite provided by Paul Kersey were used to give a more comprehensive overview then our current tests.

The results showed reasonable precision per-atom in the validation suite but were less favourable per-molecule, the best implementation assigned types to <90% of the molecules with <16% assigned correctly.

Assigned Types
(Atoms)
Correct Types
(Atoms)
Assigned Types
(Molecules)
Correct Types
(Molecules)
ForceFieldConfigurator 15576 90.1% 12932 74.8% 678 89.1% 118 15.5%
MMFF94AtomTypeMatcher 17120 99.1% 12309 71.2% 659 86.6% 75 9.9%
MmffAtomTypeMatcher 17279 100.0% 17279 100.0% 761 100.0% 761 100.0%

I wasn't keen to hard code the atom typing procedure but was delighted to find Robert Hanson of JMol had some SMARTS patterns that could be used as a starting point. After about a month of tweaking I managed to simplify the SMARTS patterns and achieve 100% precision on the validation suite. You can find the SMARTS patterns here: /org/openscience/cdk/forcefield/mmff/MMFFSYMB.sma.

Apart from improving atom type assignments the charge assignment also needed updating to include charge sharing and bond class differences. This wasn't quite as simple as I first thought as the parameter set parsing also needed reworking. After many months of analysis paralysis I decided last week to just rewrite what was needed and delegate calls from the existing implementation.

Now the patch is finished, charge assignments are much better. Notice that in the previous version (labelled CDK 1.5.10) equivalent terminal oxygens and the nitrogens in imidazole anion have different values. The overall charge was also inconsistent with the formal charges.

Improved charge assignment

Roger Sayle noted to me this week that MMFF charges should not be affected by representation, for example, charge separated pi bonds in nitro groups or phosphates.

Charges are independant of representation

Many thanks to Mark and Alison Choy for reporting the problem and adding patches for debugging and testing.