Recently a functional cheminformatics toolkit, ChemF, was published. It demonstrates an impressively light SMILES parser, you can check it out on github. The authors choose to use a coordinate list representation for their molecule which naturally lends it's self to immutability. Now I'm not going to rant too much about why immutability is great but it got me thinking, how could one write an efficient immutable adjacency list representation? There is already an immutable version in the Graph for Scala library but I wanted to have a lightweight implementation for Java. I've been using an adjacency list a lot recently (see previous posts) and writing an immutable graph would definitely be useful. As a reminder here is a graph and it's representation as an adjacency list and matrix.
Graph API
First of, let us define our API. Wikipedia provides a nice summary of one Graph - Abstract Data Type which we can use. I'm making use of Java 8 - Virtual Extensions here so we only need implement the 'add' and 'neighbors' operations. It should be noted we can't use the primitives for the collections here as we can't make a read only array. As we're not doing arithmetic and thanks to integer caching there is only a tiny performance hit.
Code 1 - Graph API
public interface Graph { /** neighbors of v */ public Collection<Integer> neighbors(int v); /** add a connection between u and v */ public Graph add(int u, int v); /** determine whether u and v are adjacent */ public default boolean adjacent(int u, int v) { return neighbors(u).contains(v); } /** degree of vertex v */ public default int degree(int v) { return neighbors(v).size(); } }
Mutable implementation
To begin, we define a mutable implementation which simply stores a list of lists. We will base our other implementations on this. For conciseness invalid inputs are not handled.
Code 2 - MutableAdjacencyList
public final class MutableAdjacencyList implements Graph { private final List<List<Integer>> vs; public MutableAdjacencyList(List<List<Integer>> vs) { this.vs = vs; } public Collection<Integer> neighbors(int v) { return vs.get(v); } public MutableAdjacencyList add(int u, int v) { // ensure capacity while (u >= vs.size() || v >= vs.size()) vs.add(new ArrayList<>(1)); // associate u with v and v with u vs.get(u).add(v); vs.get(v).add(u); return vs; } }
This will work perfectly well however we have not encapsulated our fields and one could manipulate the internals without going through our defined API. Here we can add a directed edge to an undirected graph.
Code 3 - MutableAdjacencyList usage
Graph g = new MutableAdjacencyList(...).add(0,1) .add(1,3) .add(2,3); // state: {{1},{0,3},{3},{2,1}} // we can change the internal state without going // through the API g.neighbours(0).add(20); // new state: {{1,20},{0,3},{3},{2,1}}
One obvious fix would be wrap the list when the neighbors are requested. This wrapping provides a read only view of the data.
Code 4 - Encapsulated
public Collectionneighbors(int v) { return Collections.unmodifiableList(vs.get(v)); }
This successfully stops unmanaged changes to the internal state but adds a penalty (albeit small) for access operations.
Defensive implementation
Generally we're going to want to create our graph once and then perform multiple access operations. Therefore we should avoid creating a new wrapper for each access to a vertices neighbors. Instead of adding the modifiable wrapper on access we can do it just once. As shown below we can make a few simple modifications to the mutable implementation.
Code 5 - DefensiveAdjacencyList
public final class DefensiveAdjacencyList implements Graph { private final List<List<Integer>> vs; public DefensiveAdjacencyList(List<List<Integer>> vs) { List<List<Integer>> tmp = new ArrayList<>(vs); // defensive copy for (int v = 0; v < tmp.size(); v++) tmp.set(v, Collections.unmodifiableList(new ArrayList<>(tmp.get(v)))); this.vs = Collections.unmodifiableList(vs); } public Collection<Integer> neighbors(int v) { return vs.get(v); } public DefensiveAdjacencyList add(int u, int v) { List<List<Integer>> tmp = new ArrayList<>(vs); while (u >= tmp.size() || v >= tmp.size()) tmp.add(new ArrayList<>(1)); tmp.set(u, new ArrayList<>(tmp.get(u))); tmp.set(v, new ArrayList<>(tmp.get(v))); tmp.get(u).add(v); tmp.get(v).add(u); return new DefensiveAdjacencyList(tmp); } }
We have modified the add operation and as we're allowing input in the constructor we perform a defensive copy. Note in the constructor we have to copy each neighbor list as the wrapper only provides a read only view and does not protect against modification to the underlying list. The follow snippet demonstrates that if we are not careful one can still change the unmodifiable list b by changing a.
Code 6 - Read only view
Lista = new ArrayList (); a.add("how many items?"); List b = Collections.unmodifiableList(a); System.out.println("there are " + b.size() + " items"); // 1 a.add("are you sure?"); System.out.println("there are now " + b.size() + " items"); // 2
This is a very simple immutable implementation however it is hideously slow to add new neighbors. If we remove the public constructor and force all update operations through our own methods we have complete control of the lists and can do a lot better.
Update only implementation
This next implementation has a private constructor. We now do not need to copy every neighbor list as only we have access to the constructor and can cut down on our copy operations. Granted we could also do this above but the point is we really want to fully encapsulate the operations on our graph.
Code 7 - UpdateOnlyAdjacencyList
public final class UpdateOnlyAdjacencyList implements Graph { private final List<List<Integer>> vs; // private constructor - we have complete control over vs private UpdateOnlyAdjacencyList(List<List<Integer>> vs) { this.vs = vs; } public Collection<Integer> neighbors(int v) { return vs.get(v); } public UpdateOnlyAdjacencyList add(int u, int v) { List<List<Integer>> tmp = new ArrayList<>(vs); while (u >= tmp.size() || v >= tmp.size()) tmp.add(Collections.emptyList()); tmp.set(u, include(tmp.get(u), v)); tmp.set(v, include(tmp.get(v), u)); return new UpdateOnlyAdjacencyList(tmp); } // helper method for adding a value to an immutable list private static List<Integer> include(List<Integer> list, int x) { List<Integer> mutable = new ArrayList<>(list); mutable.add(x); return Collections.unmodifiableList(mutable); } }
Performance
To benchmark how each implementation performed I incrementally generated 1000 random graphs with between 10 and 100 vertices. Each vertex had 0.08 probability that it is connected to another vertex. The plot below shows the average time taken by each implementation to generate a each graph (50 repeats).The update only implementation performs much better than the full defensive copy. Admittedly an extreme case but it demonstrates if we were to open up the implementation to allow mutable input (as we did in the constructor) the cost of doing those copies can be very expensive.
We can extrapolate out the timings and determine roughly how long it would take each implementation to construct 50,000 graphs. If we plot on a continuous scale we really only see a noticeable a difference between the mutable and update after |E| = 100. At an |E| of 200 the defensive copy now takes upwards of 80 seconds whilst the other two both fall below 2.5 seconds.
Although we see closer performance with the update only implementation as the requirements grow we still see divergence. Now, I deliberately started with a poor implementation (DefensiveCopy) just see what we could do wrong. What you really want to do to tune the performance is to combined the update only with a builder. Similar to how the Java library provides a StringBuilder for assembly a large number immutable strings we can define a GraphBuilder for building the immutable graphs. This allows to have a buffer before we fix the state of our graph and in some cases can actually offer a small performance improvement do to fewer resize operations.
Fully Immutable
Finally, it should be noted that the Google Guava library provides truly immutable collections (opposed to the read only wrappers) however there doesn't seem to be one which provides update operations, that is, create a new immutable list with a single item changed. In theory you could get very fast immutability if you only update the part that was actually changing. That is not simple.Links
[1] - The value of values. http://www.infoq.com/presentations/Value-Values
[2] - Scala Graph https://www.assembla.com/spaces/scala-graph/wiki
[3] - Wikipedia Graph ADT Article - http://en.wikipedia.org/wiki/Graph_(abstract_data_type)
[4] - Guava Immutable Collections - http://code.google.com/p/guava-libraries/wiki/ImmutableCollectionsExplained
Appendix
Scala does provide an immutable Vector offering constant time update operations. Here is what our implementation would look like.
Code 8 - Immutable graph in Scala
class AdjacencyList(vs: Vector[List[Int]]) extends Graph { def adjacent(u: Int, v: Int) = vs(u).contains(v) def neighbors(v: Int) = vs(v) def add(u: Int, v: Int) = { // ensure capacity val n = 1 + (u max v) - size() val vs = this.vs ++ Vector.fill(n)(List[Int]()) // update u, v neighbor lists new AdjacencyList(vs.updated(u, v :: vs(u)) .updated(v, u :: vs(v))) } }
No comments:
Post a Comment
Note: only a member of this blog may post a comment.