Wednesday, 19 December 2012

These go up to eleven

You're on ten ... Where can you go? Where? ... Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do? [pause] Put it up to eleven.- Nigel Tufnel, This is Spinal Tap

Today, I was reminded of this great movie scene when testing a new piece of code. The problem I was testing was to find the shortest paths from one vertex in a graph to all other connected vertices. Listing one path is easy. Listing all alternate shortest paths is a bit more tricky. The current implementation only provided a single shortest path and there was a bug associated with this problem.

I'm going to skim over the details here but will show some of the API. The patch is current on this Sourceforge tracker if you wish to see the complete implementation. To start, some simple cases such as cubane were tested.

The 6 shortest paths between oposite vertices in a cubane molecule (CHEBI:33014)

With the correct result confirmed, it was time to turn up the complexity.  C60 Fullerene seemed a good choice. Unfortunately, after running it, I was disappointed to find there is at maximum only six possible paths to an opposite vertex. This one is a bit tricky to verify but luckily there is a ball and stick model in the office which made it a little easier. I also generated some spiro compounds but I was looking for a more difficult molecule.

C60 fullerene - CHEBI:33128


I was pretty much done with testing until this morning I stumbled across the FULLERENE program and database which provides a download for a C720 fullerene. Now that is a challenge!


C720 fullerene


Time to turn it up to eleven!

Here is the existing code to reconstruct the path from each atom to every other atom. Note we only do the the comparison one way as it is quicker to reverse the path to get both from i to j and from j to i.

Code 1 - All pairs shortest paths
for(int i = 0, n = fullerene.getAtomCount(); i < n; i++){
    for(int j = i + 1; j < n; j++){
        List path = PathTools.getShortestPath(fullerene,
                                                     fullerene.getAtom(i),
                                                     fullerene.getAtom(j));
    }
}

I only managed to run it once and as it took 3481106 ms (~58 min) to complete. Using the new utility we can do the equivalent in well under a second (~200 ms).

Code 2 - All pairs shortest paths new API
AllShortestPaths asp = new AllShortestPaths(fullerene);
 
for(int i = 0, n = fullerene.getAtomCount(); i < n; i++){
    for(int j = i + 1; j < n; j++){  
        IAtom[] path = asp.from(i).atomsTo(j);
    }
}

The current code is actually just a wrapper around several searches from each atom. Including the path reconstruction in the Floyd-Weshall algorithm (implemented as TopologicalMatrix) is also possible but for simplicity I left the AllShortestPaths as it a simple helper. The atomsTo method can be replaced with the pathTo method which will run even quicker. We can also count the number of shortest paths between each vertex.

Code 3 - Counting all pairs paths
sum += asp.from(i).nPathsTo(j);

From this we find a total of 39,086,040 (double that if you include j to i). We can get the actual paths but this takes significantly longer - around 8000 ms. Now thats pretty slow but we can check the theoretical maximum speed by filling an array the size of each shortest paths set.

Code 4 - storing all pairs paths
int[][] paths = new int[asp.from(i).nPathsTo(j)][asp.from(i).distanceTo(j) + 1];
for(int k = 0; k < paths.length; k++){
    for(int l = 0; l < paths[k].length; l++) {
        paths[k][l] = k + l;
    }
}


This will take 3000+ ms on it's own. It may be possible to get the time close to that but it is likely that most of the time difference is spent in array copies and traversing the recursive method call. Currently, this isn't worth optimising further. As the API provides a way to check how many shortest paths there are, it is better to only reconstruct all the paths if there is sufficiently few. After all, if you could reconstruct 39 million paths, you've still got to do something else with them afterwards.


4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. That's annoying, it ate my comment. Try again...

    Anyway, you visited my blog recently: http://blog.sigfpe.com/2009/06/hashing-molecules.html
    I've been generating these fractals using the Huckel/tight-binding method for carbon nanostructures in a magnetic field: https://plus.google.com/107913314994758123748/posts
    I'd like to try fullerenes. Do you have an easy way to generate a list of vertices, edges and faces (chemists don't usually care about faces, but I care about the magnetic flux through them) for things like buckyballs?

    ReplyDelete
    Replies
    1. Hi Dan,

      As John says, I can provide you with either embeddings (face lists) or a description of the algorithm I used, if you were wanting to code it in Haskell or something. Or the java code, of course, which is on GitHub.

      gilleain

      Delete
  3. Not directly I'm afraid. There is a couple of programs which can generate coordinates from which you could then determine the faces. I don't know the exact algorithm for faces but it may be possibly by adapting the union of the minimum cycle bases (Vismara P. 1997).

    Also, you might want to ask Gilleain Torrance - he has been working on generating plane embedded fullerenes and their equivalent classes.

    FULLERENE - Fullerene Generation and Analysis Program

    ReplyDelete