This post isn't really about being efficient, it is more about being accurate. When using the Chemistry Development Kit (CDK) there's always been something that didn't look quite right with the drawing of the chemical structures. I've noticed this for a while but last week I finally discovered a solution and decided to test it out.
The problem is very subtle but if you look closely you may notice the atom symbols aren't accurately aligned. I've outlined the bounding boxes of the atom symbols, as this is the reference used to calculate the centre. The diagram below shows an unlikely molecule but it demonstrates the alignment problem. I included a silver (Ag) atom as it's one of the few atom symbols that has a character decent.
If bonds are rendered in front of the symbols and zoomed, the misalignment is clearly visible.
The problem is very subtle but if you look closely you may notice the atom symbols aren't accurately aligned. I've outlined the bounding boxes of the atom symbols, as this is the reference used to calculate the centre. The diagram below shows an unlikely molecule but it demonstrates the alignment problem. I included a silver (Ag) atom as it's one of the few atom symbols that has a character decent.
If bonds are rendered in front of the symbols and zoomed, the misalignment is clearly visible.
This isn't really a problem with CDK but instead with the way the Java 2D measures fonts - or perhaps how you would normally want to measure the font. In most situations, you want to align a multiple characters uniformly.
FontMetrics
can access an array of different character widths with FontMetrics.getWidths()
but only a single height FontMetrics.getHeight()
. Using an isolated example, it's clear to see that when rending multiple characters, this is exactly what is needed. This image shows the bounding box from FontMetrics around the lower and upper case symbols.
I've been pondering how to fix this for a while but the only classes I knew to measure font were
FontMetrics
and the related LineMetrics
. FontMetrics
and LineMetrics
are also the main classes discussed in the Measuring Text Tutorial. However, looking into an earlier section of the same tutorial the correct class can be found. In Text Layout Concepts it states: "To properly position, measure, and render text, you need to keep track of each individual character and the style applied to that character. Fortunately, the TextLayout
class does this for you".
This actually doesn't look as neat when there are multiple symbols but it demonstrates that the
TextLayout
provides the true bounding box of the characters. Here's the existing code for calculating text bounds.
Loading ....
The TextLayout
class can easily replace the FontMetrics
and correctly align our symbols. The text origin point also needed updating but I'll skip that as the method it's pretty similar.Loading ....
As well as the alignment there were some other tweaks I thought might improve the look. Firstly, looking at the carbonyl group (C=O), the bonds are cut off at different lengths due to the square edge. Secondly, the padding is a fixed size so for very small or very large fonts, the bounds did not look aesthetically pleasing. To align the ends of bonds a simple solution was to use a
RoundRectangle
(oval) instead of a Rectangle. The padding was a little tricky and I'll come back to that later. Below is an animation that works through the alterations from the original rendering step by step.
|
Conversion of the original molecule using FontMetrics to using TextLayout and a rounded boundary |
It's a bit difficult to see the alignment change between the second and third frame but can also be seen when it loops around again to the start. Here are the molecules side by side - the difference is most noticeable on the two oxygen atoms.
Unfortunately the alignment still isn't perfect but this is due to some rounding errors on the rendering. To correct the padding of the bounding box was a little tricky. I wanted ensure narrow symbols (e.g. 'I') and wide labels (e.g. 'protein') were rending nicely. It turned out the way to do this was to scale the padding with the font height.
Loading ....
In addition to the padding, there needed to be a check for very wide labels. Using the ratio of width to height allows a check for 'wide' labels. If it is 'wide' then an oblong rounded rectangle is drawn instead of an oval.
Loading ....
As you can see on this rather odd structure, the bounding box around the iodine atom and the protein pseudo atom fits well.
I mentioned earlier that there was still some misalignment due to rounding errors. These are amplified when rendering smaller images. Looking a bit more at the code, it seems there was some explicit loss of precision which could be removed. Avoid the conversion can minimise the difference. In this case, it required replacing the int[] transformPoint(x, y) method with one which returns a double.
Loading ....
When rendering a smaller image it makes a noticeable difference - keep an eye on the iodine symbol. I should note that the bounding shapes seem to be skewed but without the outline this would not be noticeable.
As a final note, Ralf Stephan has been doing some great work with JChemPaint. He has recently looked at two pass rendering which means that the renderer can work out where the atoms are before it draw the bonds. It shouldn't be too difficult to incorporate the TextLayout
and RoundRectangle
. It would be really useful - particularly when you have a background colour which changes, such as selecting a row in an interactive table.Summary
- Use
TextLayout
instead ofFontMetrics
if you want the exact bounding box of the font - Losing precision when dealing with graphics can lead to visible differences
John, this is brilliant work! This has not been converted in patches yet, I understand? How did you make those images then?
ReplyDeleteSo I did make a patch but it wasn't very clean. The trouble is the CDK rendering code doesn't place charge very well so I also had to try patch that also. I then found the JChemPaint rendering code handles charge in a group of text and thus the placement is easier. It also places charge correctly (e.g. superscript). I might try and port the JChemPaint atom generator in the new year - in which is would be easier to include the TextBounds.
ReplyDelete