Wednesday 30 January 2013

Making the mutable, immutable

There are many advantages to immutability. Immutability makes it impossible to the change of state and establish an invariant in your software. Naturally this lends it's self to parallelisation. Since last year I have been playing around with Scala and the power really started to click. There's a great talk by Rich Hickey on The Value of Values which is also very thought provoking. Even Java, an imperative language, makes extensive use of immutability on it's value classes: String, Integer, Double. Often it is recommended that you minimise the mutability of value classes (Effective Java - Item 15). Unfortunately these are often forgotten when writing higher level data types as they are more easily expressed as mutable. This often leads to defensive copying and over use of cloning. If you can make the data type immutable not only are copies/clones irrelevant but updates can be made more efficient than a full defensive copy followed by an update.


Recently a functional cheminformatics toolkit, ChemF, was published. It demonstrates an impressively light SMILES parser, you can check it out on github. The authors choose to use a coordinate list representation for their molecule which naturally lends it's self to immutability. Now I'm not going to rant too much about why immutability is great but it got me thinking, how could one write an efficient immutable adjacency list representation? There is already an immutable version in the Graph for Scala library but I wanted to have a lightweight implementation for Java. I've been using an adjacency list a lot recently (see previous posts) and writing an immutable graph would definitely be useful. As a reminder here is a graph and it's representation as an adjacency list and matrix.

Graph API

First of, let us define our API.  Wikipedia provides a nice summary of one Graph - Abstract Data Type which we can use. I'm making use of Java 8 - Virtual Extensions here so we only need implement the 'add' and 'neighbors' operations. It should be noted we can't use the primitives for the collections here as we can't make a read only array. As we're not doing arithmetic and thanks to integer caching there is only a tiny performance hit.

Code 1 - Graph API
public interface Graph {
 
    /** neighbors of v */
    public Collection<Integer> neighbors(int v);
    
    /** add a connection between u and v */
    public Graph add(int u, int v);
    
    /** determine whether u and v are adjacent */
    public default boolean adjacent(int u, int v) {
        return neighbors(u).contains(v);
    }
    
    /** degree of vertex v */
    public default int degree(int v) {
        return neighbors(v).size();
    }    
}

Mutable implementation 

To begin, we define a mutable implementation which simply stores a list of lists. We will base our other implementations on this. For conciseness invalid inputs are not handled.

Code 2 - MutableAdjacencyList
public final class MutableAdjacencyList implements Graph {
 
    private final List<List<Integer>> vs;
 
    public MutableAdjacencyList(List<List<Integer>> vs) {
        this.vs = vs;
    }  
 
    public Collection<Integer> neighbors(int v) {
        return vs.get(v);
    }
 
    public MutableAdjacencyList add(int u, int v) {
 
        // ensure capacity
        while (u >= vs.size() || v >= vs.size())
            vs.add(new ArrayList<>(1));
 
        // associate u with v and v with u
        vs.get(u).add(v);
        vs.get(v).add(u);
 
        return vs;        
    } 
}

This will work perfectly well however we have not encapsulated our fields and one could manipulate the internals without going through our defined API. Here we can add a directed edge to an undirected graph.

Code 3 - MutableAdjacencyList usage
Graph g = new MutableAdjacencyList(...).add(0,1)
                                       .add(1,3)
                                       .add(2,3);
// state: {{1},{0,3},{3},{2,1}}
// we can change the internal state without going
// through the API
g.neighbours(0).add(20); 
 
// new state: {{1,20},{0,3},{3},{2,1}}

One obvious fix would be wrap the list when the neighbors are requested. This wrapping provides a read only view of the data.

Code 4 - Encapsulated
public Collection neighbors(int v) {
    return Collections.unmodifiableList(vs.get(v));
}

This successfully stops unmanaged changes to the internal state but adds a penalty (albeit small) for access operations.

Defensive implementation

Generally we're going to want to create our graph once and then perform multiple access operations. Therefore we should avoid creating a new wrapper for each access to a vertices neighbors. Instead of adding the modifiable wrapper on access we can do it just once. As shown below we can make a few simple modifications to the mutable implementation.

Code 5 - DefensiveAdjacencyList
public final class DefensiveAdjacencyList implements Graph {
 
    private final List<List<Integer>> vs;
 
    public DefensiveAdjacencyList(List<List<Integer>> vs) {
        List<List<Integer>> tmp = new ArrayList<>(vs);
        // defensive copy
        for (int v = 0; v < tmp.size(); v++)
            tmp.set(v, Collections.unmodifiableList(new ArrayList<>(tmp.get(v))));
        this.vs = Collections.unmodifiableList(vs);
    }
 
    public Collection<Integer> neighbors(int v) {
        return vs.get(v);
    }
 
    public DefensiveAdjacencyList add(int u, int v) {
 
        List<List<Integer>> tmp = new ArrayList<>(vs);
 
        while (u >= tmp.size() || v >= tmp.size())
            tmp.add(new ArrayList<>(1));
 
        tmp.set(u, new ArrayList<>(tmp.get(u)));
        tmp.set(v, new ArrayList<>(tmp.get(v)));
 
        tmp.get(u).add(v);
        tmp.get(v).add(u);
 
        return new DefensiveAdjacencyList(tmp); 
    } 
}

We have modified the add operation and as we're allowing input in the constructor we perform a defensive copy. Note in the constructor we have to copy each neighbor list as the wrapper only provides a read only view and does not protect against modification to the underlying list. The follow snippet demonstrates that if we are not careful one can still change the unmodifiable list b by changing a.

Code 6 - Read only view
List a = new ArrayList();
a.add("how many items?");
List b = Collections.unmodifiableList(a);
 
System.out.println("there are " + b.size() + " items"); // 1
a.add("are you sure?");
System.out.println("there are now " + b.size() + " items"); // 2

This is a very simple immutable implementation however it is hideously slow to add new neighbors. If we remove the public constructor and force all update operations through our own methods we have complete control of the lists and can do a lot better.

Update only implementation

This next implementation has a private constructor. We now do not need to copy every neighbor list as only we have access to the constructor and can cut down on our copy operations. Granted we could also do this above but the point is we really want to fully encapsulate the operations on our graph.

Code 7 - UpdateOnlyAdjacencyList
public final class UpdateOnlyAdjacencyList implements Graph {
 
    private final List<List<Integer>> vs;
 
    // private constructor - we have complete control over vs
    private UpdateOnlyAdjacencyList(List<List<Integer>> vs) {        
        this.vs = vs;
    }
 
    public Collection<Integer> neighbors(int v) {
        return vs.get(v);
    }
 
    public UpdateOnlyAdjacencyList add(int u, int v) {
        
        List<List<Integer>> tmp = new ArrayList<>(vs);
 
        while (u >= tmp.size() || v >= tmp.size())
            tmp.add(Collections.emptyList());
 
        tmp.set(u, include(tmp.get(u), v));
        tmp.set(v, include(tmp.get(v), u));
 
        return new UpdateOnlyAdjacencyList(tmp);        
    }
  
    // helper method for adding a value to an immutable list
    private static List<Integer> include(List<Integer> list, int x) {
        List<Integer> mutable = new ArrayList<>(list);
        mutable.add(x);
        return Collections.unmodifiableList(mutable);
    } 
}

Performance

To benchmark how each implementation performed I incrementally generated 1000 random graphs with between 10 and 100 vertices. Each vertex had 0.08 probability that it is connected to another vertex. The plot below shows the average time taken by each implementation to generate a each graph (50 repeats).

The update only implementation performs much better than the full defensive copy. Admittedly an extreme case but it demonstrates if we were to open up the implementation to allow mutable input (as we did in the constructor) the cost of doing those copies can be very expensive.

We can extrapolate out the timings and determine roughly how long it would take each implementation to construct 50,000 graphs. If we plot on a continuous scale we really only see a noticeable a difference between the mutable and update after |E| = 100. At an |E| of 200 the defensive copy now takes upwards of 80 seconds whilst the other two both fall below 2.5 seconds.

Although we see closer performance with the update only implementation as the requirements grow we still see divergence. Now, I deliberately started with a poor implementation (DefensiveCopy) just see what we could do wrong. What you really want to do to tune the performance is to combined the update only with a builder. Similar to how the Java library provides a StringBuilder for assembly a large number immutable strings we can define a GraphBuilder for building the immutable graphs. This allows to have a buffer before we fix the state of our graph and in some cases can actually offer a small performance improvement do to fewer resize operations.

Fully Immutable

Finally, it should be noted that the Google Guava library provides truly immutable collections (opposed to the read only wrappers) however there doesn't seem to be one which provides update operations, that is, create a new immutable list with a single item changed. In theory you could get very fast immutability if you only update the part that was actually changing. That is not simple.

Links

[1] - The value of values. http://www.infoq.com/presentations/Value-Values
[2] - Scala Graph https://www.assembla.com/spaces/scala-graph/wiki
[3] - Wikipedia Graph ADT Article - http://en.wikipedia.org/wiki/Graph_(abstract_data_type)
[4] - Guava Immutable Collections - http://code.google.com/p/guava-libraries/wiki/ImmutableCollectionsExplained

Appendix

Scala does provide an immutable Vector offering constant time update operations. Here is what our implementation would look like.


Code 8 - Immutable graph in Scala
class AdjacencyList(vs: Vector[List[Int]]) extends Graph {
 
  def adjacent(u: Int, v: Int) = vs(u).contains(v)
 
  def neighbors(v: Int) = vs(v)
 
  def add(u: Int, v: Int) = {    
    // ensure capacity
    val n  = 1 + (u max v) - size()
    val vs = this.vs ++ Vector.fill(n)(List[Int]())
    
    // update u, v neighbor lists
    new AdjacencyList(vs.updated(u, v :: vs(u))
                        .updated(v, u :: vs(v)))
  }
 
}