Friday 12 September 2014

Not to scale

The latest release of the CDK (1.5.8) includes a new generator for rendering structure diagrams. A detailed introduction to configuring the new generator is available on the CDK wiki[1].

The new generator can be used as a drop in replacement in existing code. However, one aspect of rendering that I've struggled with previously was getting good sized depictions with the CDK - most notably with vector graphic output. This post will look at how we can size depictions and will provide code in an example project as a reference.

ChEBI's current entity of the month - maytansine [CHEBI:6701] will be used to demonstrate the sizing.

Parameters

Three parameters that are important in the overall sizing of depictions. These are the BondLength, Scale, and Zoom which are all registered as BasicSceneGenerator parameters. The Zoom is not needed if we allow our diagram to be fitted automatically.

The BondLength can be set by the user and has a default value of '40' whilst the Scale is set during diagram generation. BondLength units are arbitrary - for now we'll consider this as '40 px'.

Scaling

The Scale parameter is used to render molecules with different coordinate systems consistently[2,3]. The value is determined using the BondLength parameter and the bond length in the molecule. For maytansine [CHEBI:6701] the median bond length is ~0.82. Again, the units are arbitrary - this could be Angstroms (it isn't).

The Scale is therefore the ratio of the measured bond length (0.82) to the desired bond length (40 px). For this example, the scale is 48.48. The coordinates must be scaled by ~4800% such that each bond is drawn as 40 px long.

Bounds

Now we know our scale (~48.48), how big is our depiction going to be? It depends how we measure it. One way would be to check the bounding box that contains all atom coordinates (using GeometryUtil.getMinMax()). However, this does not consider the positions of adjuncts and would lead to parts of the diagram being cut off[4].

Fortunately the new generator provides a Bounds rendering element allowing us to determine the true diagrams bounds of 8.46x8.03. Since the scale is ~48.48 the final size of the depiction would be ~410x390 px. A margin is also added.


Current Rendering API

Now we have the size of our diagram we can render raster images. Unfortunately the current rendering API makes this a little tricky as the diagram is generated after the desired image size is provided by the user. To get the correct size we need to generate the diagram twice (to get the bounds) or use an intermediate representation (we'll see this later).

Code 1 - Current rendering API
// structure with coordinates
IAtomContainer container = ...; 

// create the renderer - we don't use a font manager
List<IGenerator<IAtomContainer>> generators = 
        Arrays.asList(new BasicSceneGenerator(),
                      new StandardGenerator(new Font("Verdana", Plain, 18));
AtomContainerRenderer renderer = new AtomContainerRenderer(generators, null); 

Graphics2D g2 = ...; // Graphics2D to draw raster / vector graphics
Rectangle2D bounds = ...; // need the bounds here!
renderer.paint(new AWTDrawVisitor(g2),
               bounds);

Vector graphics

To render scalable graphics we can use the VectorGraphics2D[5] implementations of the Java Graphics2D class. Vector graphics output can use varied units (e.g. pt, mm, px) - the VectorGraphics2D uses mm.

Without adjusting our scaling the render of maytansine [CHEBI:6701] would be displayed with bond lengths of 40 mm and a total size of ~410x390 mm. The output can be rescaled after rendering but the default width of 41 cm is a bit large. We therefore need to change our desired bond length.

The bond length of published structure diagrams varies between journals. A common and recommended style for wikipedia [6] is 'ACS 1996' - the style has a bond length of '5.08 mm'. Although setting the BondLength parameter to '5.08' would work, other parameters would need adjusting such as Font size (which is provided in pt!).

To render the diagram with the same proportions as the raster image we can instead resize the bounds and fit the diagram to this. Since the desired bond length is '5.08 mm' instead of '40 mm' we need rescale the diagram by 12.7 %. Our final diagram size is then ~52x50 mm. The border for ACS 1996 is '0.56 mm' which can be added to the diagram size.

Example code

To help demonstrate the above rendering I've put together a quick GitHub project johnmay/efficient-bits/scaled-renders. The code provides a convenient API and a command line utility for generating images.

Code 2 - Intermediate object
// structure with coordinates
IAtomContainer container = ...; 

// create the depiction generator
Font font = new Font("Verdana", Plain, 18);
DepictionGenerator generator = new DepictionGenerator(new BasicSceneGenerator(),
                                                      new StandardGenerator(font));

// generate the intermediate 'depiction'
Depiction depiction = generator.generate(container);

// holds on to the rendering primitives as well as the size
double w = depiction.width();
double h = depiction.height(); 

// draw at 'default' size
depiction.draw(g2, w, h);

// generate a PDF (or SVG)
String pdfContent = depiction.toPdf(); // default size
String pdfContent = depiction.toPdf(1.5); // 1.5 * default size
String pdfContent = depiction.toPdf(0.508, 0.056); // bond length, margin

The command line utility provides several options to play with and can load from molfile, SMILES, InChI, or name (using OPSIN[7]).

Code 3 - Command line
# In the project root set the following alias
$: alias render='mvn exec:java -Dexec.mainClass=Main'

# Using OPSIN to load porphyrin and generate a PDF
$: render -Dexec.args="-name porphyrin -pdf"

# Highlight one of the pyrrole in porphyrin
$: render -Dexec.args="-name porphyrin -pdf -sma n1cccc1"

# Show atom numbers
$: render -Dexec.args="-name porphyrin -pdf -atom-numbers"

# Show CIP labels
$: render -Dexec.args="-name '(2R)-butan-2-ol' -pdf -cip-labels"

# Generate a PDF / SVG for ethanol SMILES
$: render -Dexec.args="-smi CCO -pdf ethanol.pdf -svg ethanol.svg"

# Load a molfile
$: render -Dexec.args="-mol ChEBI_6701.mol -pdf chebi-6701.pdf"

You can even play with the font

$: render -Dexec.args="-name 'caffeine' -svg cc-caffeine.svg -font-family 'Cinnamon Cake' -stroke-scale 0.6 -kekule"

Links/References

  1. https://github.com/cdk/cdk/wiki/Standard-Generator
  2. http://gilleain.blogspot.co.uk/2010/09/scaling-and-text.html
  3. http://gilleain.blogspot.co.uk/2010/09/consistent-zoom-with-models-of.html
  4. http://onlinelibrary.wiley.com/doi/10.1002/minf.201200171/abstract
  5. http://trac.erichseifert.de/vectorgraphics2d/
  6. http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Chemistry/Structure_drawing
  7. http://opsin.ch.cam.ac.uk/

Thursday 11 September 2014

CDK Release 1.5.8

10.5281/zenodo.11681

CDK 1.5.8 has been released and is available from sourceforge (download here) and the EBI maven repo (XML 1).

The release notes (1.5.8-Release-Notes) summarise and detail the changes.

XML 1 - Maven POM configuration
<repository>
  <url>http://www.ebi.ac.uk/intact/maven/nexus/content/repositories/ebi-repo/</url>
</repository>
...
<dependency>
  <groupId>org.openscience.cdk</groupId>
  <artifactId>cdk-bundle</artifactId>
  <version>1.5.8</version>
</dependency>