Efficient Bits: June 2013

Thursday, 27 June 2013

Implementing Hanser's Path Graph Reduction - All Simple Cycles

In the previous post (AllRingsFinder, Sport Edition) I presented runtime results for an algorithm to find all simple cycles. This post explains in detail that implementation - there's a fair bit of code but I tried to keep it short. Hanser's algorithm is remarkably elegant and was a joy to program.

The Algorithm

The set of all cycles can be found in a number of ways. We could trace the cycles by walking every simple path (very inefficient) or perhaps combine all cycles from a cycle basis (requires basis computation first). Hanser et al, 96 describe an alternative approach - it allows us to find all the simple cycles directly.

The process is based on progressive reduction (collapsing) of a path graph. Opposed to edges (bonds) which consist of just the two endpoints, the path graph has edges which describes a walk between the endpoints. To begin all edges are simply their end points, then by removing each vertex the edges are concatenated in to path edges describing a walk. When an edge forms a loop (endpoints are the same vertex) a cycle has been found.

The animation below demonstrates the process of finding the three cycles in a naphthalene like graph. The vertex to be removed is first highlighted - and then the edges that will be combined. It's a little hard to follow when f is reached but there are three edges and three possible ways to combining two edges - from {1,2,3} we can have {1,2}, {2,3} and {1,2}. At the end the three cycles have been found as the paths on the loops.

Removing vertices in a path graph

Implementation

The pseudo code below describes the removal of a vertex x, from the set of vertices V and edges E. First, find all edges which contain x, then for every possible combination (E²), check that the only intersect is x. This avoids creating non-simple paths. If the two edges only intersect at x, then concatenate them to a new reduced edge. Add the new edge to the edges set, E. Once all new edges have been created remove, x and it's old edges from E and V.

REMOVE (x, V, E)
for each couple of paths (P_xy, P_xz) ∈ E²
if P_xy ∩ P_xz = {x} then
P_yz ← P_xy ⊕ P_xz
E ← E ∪ P_yz
for each path P_xy ∈ E
E ← E - {P_xy}
V ← V - {x}

The exact implementation is not discussed in the paper but notes the requirement of dynamic lists (List) of the edges and binary sets (BitSet) for the labels on the edges.

As expected the AllRingsFinder.remove() uses dynamic lists to store the vertices (atoms). The intersection does not use binary sets and instead takes quadratic time, O(n²) - Path.getIntersectionSize(). An alternate implementation (HanserRingFinder) reduces the number of operations for each removal PathGraph.remove(). The intersection check (PathEdge.isRealPath()), was also improved but is still quadratic.

Observations

So, can we do better and how?

We are actually going to do the opposite from the previous implementations, we will use the binary sets and not use a dynamic lists.

One key realisation is that the path graph is no different from any other graph. We can use different representations of the path graph depending on what operations we need to perform (see. The Right Representation for the Job). As we only reduce edges incident to x we can use an adjacency/incident list to store the path graph. A summary of these and other changes is shown below:

Use binary sets - intersection check is now constant time, O(1).
Use an incidence list data structure and index edges by their endpoints. With a modification discussed later, we can actually make this constant time.
Only check that edges are disjoint except for their endpoints. This is faster than checking if the sets make a singleton of just {x}. As the edges are now indexed by x, all the edges will have x in common. If the other end point is also common then the two edges will form a loop (cycle) when combined. This can be allowed by moving the check for new cycles.
Check the degree of a vertex before removal, this provides a fail-fast approach and avoids inconsistencies of a timeout. This is possible as the number of new edges is bounded by the degree of x.
Use a recursive data structure and store undirected path edges. This avoids the the reversing and copying of the dynamic lists for each concatenation. The directed path can then be reconstructed when needed - only when a cycle was found. This has minimal influence on speed on small molecules but does makes the implementation more concise.

With these observations in mind, here is what the data structures look like.

ArrayBuilder

To begin, a helper class is needed for the path reconstruction. This class simply allows us to sequentially fill up an int[] array by appending items to the end. It also provides access to the previous element and returns a self-reference on append which will allow us to the chain method calls.

Code 1 - ArrayBuilder

class ArrayBuilder {
    
    int[] xs;
    int   i = 0;
 
    ArrayBuilder(int n) {
        xs = new int[n];
    }
 
    ArrayBuilder append(int x) {
        xs[i++] = x;
        return this;
    }
 
    int prev() {
        return xs[i - 1];
    }
}

PathEdge

There are two types of PathEdge, one is a simple non-reduced edge and one for a reduced edge (a path). We can actually share most of the operations between these and so we first define the API in an abstract superclass.

Unsurprisingly the PathEdge API is very similar to a normal undirected edge in that there is a either() and other(int) for querying the endpoints. Unlike a normal edge, the bits of 'xs' member variable stores which other vertices in the graph lie in the path between the endpoints. We'll come back to order in a bit. Binary set intersection is checked using the '&' operator and for two edges to be disjoint - the intersection should be empty. The path() provides the ordered walk though the edge between the endpoints - we need to start from either end point and then reconstruct the path (see subclasses). It should be noted we use a BitSet for graphs with more than 64 vertices.

Loading ....

The first implementation, the simple edge, is just the two end points. The length is two and there are no reduced vertices. We can therefore always pass an empty set to the superclass. The path through this edge is just the endpoints. The previous vertex in the ArrayBuilder is used to query and append the other endpoint giving a path of length two.

Code 2 - SimpleEdge

final class SimpleEdge extends PathEdge {
    
    // Two endpoints and no reduced vertices (empty set).
    SimpleEdge(int u, int v) {
        super(u, v, 0);
    }
    
    // Given the previous vertex, add the other endpoint.
    ArrayBuilder reconstruct(ArrayBuilder ab) {
        return ab.append(other(ab.prev()));
    }
    
    // we only have the endpoints.
    int len() {
        return 2;
    }
}

The reduced edge is a little more complicated. It is created by concatenating two existing edges which share a common vertex. The new endpoints are the other() endpoint of each edge with respect to x. The set of reduced vertices is the union of the two edges reduced vertices and the new common vertex, x. The length of the path through this edge is the endpoints and number of reduced vertices - stored as bits so the bit count is added. Depending on the end point the path reconstruction proceeds through one edge and then through the other.

Code 3 - ReducedEdge

final class ReducedEdge extends PathEdge {
    
    final PathEdge e, f;
    
    //Two edges with a common endpoint (x). The reduced set 'xs' is the union of
    // the two edge's 'xs' sets and the common endpoint 'x'.
    ReducedEdge(PathEdge e, PathEdge f, int x) {
        super(e.other(x), f.other(x), e.xs | f.xs | 1L << x);
        this.e = e;
        this.f = f;
    }
    
    // Reconstruction delegates to reduced edges.
    ArrayBuilder reconstruct(ArrayBuilder ab) {
        // append the paths of the two edges, order depends on previous vertex 
        return u == ab.prev() ? f.reconstruct(e.reconstruct(ab))
                              : e.reconstruct(f.reconstruct(ab));
    }
    
    // Two endpoints and the number of reduced vertieces.
    int len() {
        return 2 + Long.bitCount(xs); // or: e.len() + f.len() - 1
    }
}

PathGraph

With our path edges defined we now need the graph to store them in. As noted, storing the edges in an incident list allows quick lookup. The add(PathEdge)includes a new edge in the graph and remove(int) removes and returns all edges incident to x. The reduce() method does the combination of the edges which are disjoint. Finally the remove() method, gets the incident edges, reduces them and then either adds them back to the graph or stores the newly discovered cycle. I've have omitted the degree() method for brevity but this is just the size of each list.

Code 4 - PathGraph

class PathGraph {
   
    // edges indexed by thier end points
    List[] g;
    
    PathGraph(int n) {        
        // initalise 'g'       
    }
    
    // adds an edge to the graph.
    void add(PathEdge e) {
        int u = e.either();
        int v = e.other(u);
        g[u].add(e);
        g[v].add(f);
    }
    
    // Remove (and get) all edges incident to 'x'.
    List remove(int x) {
        List es = g[x];
        for (PathEdge e : es)
            g[e.other(x)].remove(e);
        g[x] = Collections.emptyList();
        return es;
    }
    
    // combined all edges 'es' which are disjoint and share the 
    // endpoint 'x'
    List reduce(int x, List es) {
        List reduced = new ArrayList();
        
        for (int i = 0; i < es.size(); i++) {
            for (int j = i + 1; j < es.size(); j++) {
                
                PathEdge e = es.get(i);
                PathEdge f = es.get(j);
                
                if (e.disjoint(f)) 
                    reduced.add(new ReducedPathEdge(e, f, x));
            }
        }
        return reduced;
    }
    
    // Remove 'x' from the graph by reducing all incident edges. Any
    // newly discovered cycles are added to 'cs'.
    void remove(int x, List cs) {
        
        List es      = remove(x);
        List reduced = reduce(x, es);
        
        // for each new edge, either store the cycle or
        // add it back into the graph
        for (PathEdge e : reduced) {
            if(e.loop()) {
                cs.add(e.path());
            } else {
                add(e);
            }
        } 
    }    
}

Although the edge access is more efficient, as the graphs becomes more dense (during reduction) we need to modify more of the incident lists on each removal. We can avoid this by by imposing a predefined order on how vertices are removed. As recommended in the original publication, processing by degree is a good approach. With the ordering predefined, edges need only be indexed by a single endpoint.

Code 5 - Ranked removal

    // indicates the order which vertices will be removed
    int[] rank;
    
    // adds an edge to the graph on the least rank vertex.
    void add(PathEdge e) {
        int u = e.either();
        int v = e.other(u);
        if (rank[u] < rank[v])
            g[u].add(e);
        else 
            g[v].add(f);
    }
    
    // Remove (and get) all edges incident to 'x'.
    List remove(int x) {
        List es = g[x];
        g[x] = Collections.emptyList();
        return es;
    }

All these changes contribute (some more than others) to the performance improvements seen in the previous post (AllRingsFinder, Sport Edition). To run the algorithm we simply build the path graph then successively remove all the vertices.

What's really nice about this implementation is the PathEdge is a persistant data structure and can easily be implemented in a purely functional language. I actually managed to write a version in Scala and Haskell - one issue is that PathGraph also needs to be persistant. There are of course good persistant random access array but they're simply not as fast. Switching back to the edge list representation actually allows you to use cons-lists (singly linked list) but at the cost of performance.

Wednesday, 26 June 2013

AllRingsFinder, Sport Edition

Recently I've been looking at efficient perception of cycles in graphs. The goal is to improve and unify ring perception methods in the Chemistry Development Kit (CDK). Slowly working through, optimising and adding algorithms there was one I was dreading. The AllRingsFinder in CDK provides all simple cycles (every possible ring) in a graph by progressively collapsing edges (Hanser et al, 1996). The AllRingsFinder is used (sometimes incorrectly) throughout the library in many different modules.

To perceive all simple cycles can take a long time and I was pessimistic about making any improvement with it's speed. Fortunately, I was pleasantly surprised.

I'll go through the technical aspects of the implementation in the next post. This post will primarily focus on the performance. It should be noted that the algorithm is exactly the same and only the data structures and operations were modified.

Preliminaries

We will be testing the new implementation named AllCycles against the existing AllRingsFinder and HanserRingFinder. The HanserRingFinder was written by Rich Apodaca (depth-first) and is included in the CDK SMSD module. The AllCycles takes as input an adjacency list graph with the vertices described by integers. This is standard for graph algorithms - the time to convert the CDK molecule (IAtomContainer) is included in all results.

The benchmarks were run with Java 7 (21) on a 2.66 GHz Intel Core i7.

Complete Graphs Benchmark

When each vertex in the graph is adjacent to every other vertex we have the worst case for the algorithm. The number of simple cycles in these graphs grows very rapidly. The graphs are known as complete graphs and are named K_n,n being the number of vertices. The table below summarises the results of each implementation in milliseconds.

AllRingsFinder' is a modified version of AllRingsFinder that does not convert the paths to IRing objects. This conversion is currently inefficient and can impact the performance when there is a large number of cycles. The AllCycles and HanserRingFinder implementations already return the paths as a sequence of integers and atoms respectively.

Edit - I realise now the reason for the difference here is that I accidentally used the default and not silent domain objects. The default objects fire events on updates which can cause massive performance hits - particularly when adding atoms. Using the silent objects (no events) yields similar times to AllRingsFinder' without the IRing conversion. In short - don't use the default objects, ever... you don't need it, trust me. I'll leave the timings as reminded not to do this - the dataset results presented were already using the silent objects.

Graph	Number of Cycles	Implementation	Time (ms)
K₃	1	AllCycles	0.00 ms
		HanserRingFinder	0.00 ms
		AllRingsFinder	0.15 ms
		AllRingsFinder'	0.00 ms
K₄	7	AllCycles	0.00 ms
		HanserRingFinder	0.00 ms
		AllRingsFinder	0.95 ms
		AllRingsFinder'	0.00 ms
K₅	37	AllCycles	0.00 ms
		HanserRingFinder	0.15 ms
		AllRingsFinder	2.20 ms
		AllRingsFinder'	0.75 ms
K₆	197	AllCycles	0.10 ms
		HanserRingFinder	0.85 ms
		AllRingsFinder	7.80 ms
		AllRingsFinder'	2.00 ms
K₇	1172	AllCycles	0.55 ms
		HanserRingFinder	26.10 ms
		AllRingsFinder	193.80 ms
		AllRingsFinder'	39.80 ms
K₈	8018	AllCycles	12.0 ms
		HanserRingFinder	817.00 ms
		AllRingsFinder	10078.75 ms
		AllRingsFinder'	1360.75 ms
K₉	62814	AllCycles	994.70 ms
		HanserRingFinder	54524.85 ms
		AllRingsFinder	389597.00 ms
		AllRingsFinder'	81699.53 ms

Performance of each implementation on complete graphs - log scale

It's quite amazing that using the same algorithm and simply changing the data structures (and not firing object events) can produce more than a 300000% improvement in performance (K₉).

Dataset Benchmark

Examining the worst case gives us a concise view of how the implementation scales. To judge the real performance several datasets were used; ChEBI, HMDB and a subset of PubChem Compound (CID 0 - 25000).

There are molecules in the datasets with far more cycles than we can reasonably handle. Such molecules are considered impractical and depends on some predefined escape clause.

AllRingsFinder measures the time when the algorithm starts and terminates if a set time limit is exceeded.
AllCycles terminates when a preset maximum degree (connections) on a vertex is exceeded.
HanserRingFinder does not have a timeout and so was not tested on the datasets.

The table below summarises the results for each dataset. To allow a reasonable number of repeats the timeout for AllRingsFinder was reduced from the default 5000 ms to 50 ms. Running on ChEBI with the default timeout took 389597 ms (29 impractical) and clogged the memory so much the timing was not stable. The maximum degree used for AllCycles was arbitrarily chosen as 500. Additionally the findAllRingsInIsolatedRingSystem() was used instead of findAllRings(). This allowed nearly identical prepossessing for both implementations using the fast RingSearch. Invoking the findAllRings() directly will yield slower results than those listed below.

Data	Molecules	Implementation	Time (ms)	Impractical	Cycles

ChEBI	29954	AllCycles	256 ± 56 ms	36	155476
ChEBI	29954	AllRingsFinder	10324 ± 189 ms	86.86	131124.72

HMDB	37495	AllCycles	390 ± 48 ms	2	93148
HMDB	37495	AllRingsFinder	5410 ± 180 ms	40.83	86230.38

PubChem Compound Subset	22676	AllCycles	195 ± 55 ms	1	61229
PubChem Compound Subset	22676	AllRingsFinder	1942 ± 173 ms	5.05	60587.58

We can see that for ChEBI we were able to perceive 20000+ more cycles in ~2% of the time. As mentioned the times include the preprocessing and conversion, were molecules loaded direct as adjacency lists the time for PubChem Compound subset (195 ms) would be even faster! The counts for AllRingsFinder are fractional due to the fluctuations between repeats.

Choosing a Maximum Degree Limit

The performance of new implementation depends on choice of the maximum degree. The limit doesn't really have a meaning and may be chosen incorrectly. Set the limit too low and we lose information, too high and we may generate more information than we can handle .

Running the implementation on all of PubChem Compound (Dec '12) with a limit of 5000 we can retrospectively calculate the limit for a given percentile. In PubChem Compound there were 17842848 ring systems with more than one ring, 987 (0.005%) of which exceeded our hard maximum degree of 5000.

Plotting the cumulated percent (T) against the maximum degree reached indicates that 99% of ring systems can be completed with a limit of just 9. Due to the sheer number of systems, if we did choose that value there would still be 166131 systems which were considered impractical and did not finish.

To reduced the number of impractical molecules we need to go to fractions of a percent.

Maximum Degree	Percent (%)	Completed (ring systems)	Impractical (ring systems)

72	99.95	17834013	8835
84	99.96	17835876	6972
126	99.97	17837692	5156
216	99.98	17839293	3555
684	99.99	17841065	1783

882	99.991	17841342	1506
1062	99.992	17841429	1419
1440	99.993	17841602	1246
3072	99.994	17841789	1059

This breakdown provides a meaningful insight for an appropriate value. I haven't decided yet where to include a Run Forever option - due to the combinatorial explosion there will always be molecules which we can't compute. I'll leave you with this entertaining video which explains the amazing power of combinatorial explosion.