A taste of graph theory

Postcard of buildings by a river. Caption reads ‘Königsberg Schlossteich’

I’ve recently been working with graph databases, which give us a powerful idiom for modelling and reasoning with highly interconnected data. Before I share some of my experiences, I would like to set the scene with a basic introduction to graph theory.

Simple Graphs

Graph theory is a relatively young branch of mathematics, traced back to 1736 and the work of Leonhard Euler.

A simple graph is defined as a non-empty set of vertices, e.g. V = {1,2,3}, and a set of edges, each of which is a 2-member subset of the vertex set, e.g. E = {{1,2},{1,3},{2,3}}.

We can visualise a graph by drawing a diagram:

simple three-vertex graph drawn as a triangle

It’s tempting, particularly for the etymologically minded, to think of a graph as drawing. It’s certainly easier to think about graphs by visualising them, but drawings as representations of graphs, have the potential to be misleading, so we need to exercise caution.

For example, the same graph can be drawn like this:

simple three-vertex graph drawn with curved edges and four crossings

Notice how this diagram shows four crossings, but is in fact equal to the above graph, which shows none.

Because simple graphs are defined in terms of sets, we can note some key characteristics:

  • Each vertex only appears once. V = {1,1,2,3} is not a valid set because a=a.
  • An edge cannot join a vertex to itself. {1,1} is not a valid set, so it cannot be a valid element in the edge set.
  • There can only be one edge between two particular vertices. E = {{1,2},{1,2},{2,3},{1,3}} is not a valid set because {1,2} = {1,2}.
  • The edges have no direction. E = {{1,2},{2,1},{2,3},{1,3}} is not a valid set because {1,2} = {b,a}.

Useful variations

These restrictions are great for pure mathematics, but somewhat restrict our ability to model real-world situations. For this reason, a typical graph database relaxes the rules in various ways:

Digraphs

It introduces a notion of direction in the edges. This means we are now dealing with directed graphs or digraphs.

We can now model a graph V = {1,2,3}, E = {(1,2),(1,3),(2,3)}:

three-vertex digraph drawn as triangle

Multigraphs

We can allow skeins: multiple edges between vertices. If we replace any edge of a graph with a skein, then we have a multigraph, and our edge set becomes a multiset, as it may contain duplicated elements.

Here we expand the basic graph V = {1,2}, E = {{1,2}} by replacing the edge {1,2} with a skein of three edges, giving V = {1,2}, E = {{1,2},{1,2},{1,2}}:

two-vertex simple graph drawn as line

becomes

two-vertex multigraph with three edges

We can also model loops by allowing single-member sets as elements of EV = {1}, E = {{1}}:

one-vertex loop

Quivers

We can also allow directed multigraphs, also known as multidigraphs or quivers.

Degrees of vertices

The degree of a vertex, deg v, is the number of edges attached to it. In a simple graph deg v = |{e : e ∈ E, ve}|. Visually you can find the degree of a vertex by counting the edges that connect to it.

The indegree of a vertex, deg– v, is the number of edges leaving it, and the outdegree of a vertex, deg+ v, is the number of edges reaching it. If we model the directed edges as tuples, then deg v = |{e : e ∈ E, ∃v e = (v,y)}| and degv = |{e : e ∈ E, ∃x e = (x,v)}|. Visually you can find the indegree of a vertex by counting the arrows that leave it, and the outdegree by counting the arrows that point to it.

Walks

The power of graphs to model connected data arises when we start walking our graphs.

Mathematically, a walk is a sequence of vertices (v1,v… vn-1,vn) where each vertex vx is a member of the graph’s vertex set, and each pair of vertices (vx,vx+1) in the sequence is a member of the graph’s edge set.

Visually, a walk is found by placing a pencil on one of the dots on a graph diagram, and tracing along a line to another dot, then repeating the process.

When we come to look at graph databases, we will focus on ‘traversing’ them by walking along their edges.

Adjacency

As well as a diagram, we can represent a graph with an adjacency matrix. Consider the graph V = {1,2,3,4}, E = {{1,2},{1,3},{1,4},{2,3},{3,4}}. We can draw this graph like this:

four-vertex simple graph drawn as a square with one diagonal edge

We can also create an adjacency matrix where Aij is 1 if {i,j} ∈ E and 0 otherwise.

    ⎛0 1 1 1⎞
A = ⎜1 0 0 1⎟
    ⎜1 0 0 1⎟
    ⎝1 1 1 0⎠

The top row shows how many edges there are between 1 and each vertex: none to itself, one to 2, one to 3 and one to 4.

We can easily find the deg v by taking the sum of the corresponding row or column. We can see at a glance that the deg 1 = 3.

By multiplying an adjacency matrix by itself, we can find how many two-edge walks exist between any two vertices:

     ⎛0 1 1 1⎞   ⎛0 1 1 1⎞   ⎛3 1 1 2⎞
A² = ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ = ⎜1 2 2 1⎟
     ⎜1 0 0 1⎟   ⎜1 0 0 1⎟   ⎜1 2 2 1⎟
     ⎝1 1 1 0⎠   ⎝1 1 1 0⎠   ⎝2 1 1 3⎠

We can check that there are three two-edge walks between 1 and 1 {(1,2,1),(1,3,1),(1,4,1)}, one between 1 and 2 {(1,4,2)}, one between 1 and 3 {(1,4,3)} and two between 1 and 4 {(1,2,4),(1,3,4)}.

We can continue this trick for three-edge walks:

     ⎛0 1 1 1⎞   ⎛0 1 1 1⎞   ⎛0 1 1 1⎞   ⎛4 5 5 5⎞
A³ = ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ = ⎜5 2 2 5⎟
     ⎜1 0 0 1⎟   ⎜1 0 0 1⎟   ⎜1 0 0 1⎟   ⎜5 2 2 5⎟
     ⎝1 1 1 0⎠   ⎝1 1 1 0⎠   ⎝1 1 1 0⎠   ⎝5 5 5 4⎠

Again we can check that there are four three-edge walks between 1 and 1: {(1,2,4,1),(1,3,4,1),(1,4,2,1),(1,4,3,1)}, five between 1 and 2: {(1,2,1,2),(1,2,4,2),(1,3,1,2),(1,3,4,2),(1,4,1,2)}, five between 1 and 3: {(1,2,1,3),(1,2,4,3),(1,3,1,3),(1,3,4,3),(1,4,1,3)} and five between 1 and 4: {(1,2,1,4),(1,3,1,4),(1,4,1,4),(1,4,2,4),(1,4,3,4)}.

In general the matrix Aⁿ shows us how many n-edge walks there are between each pair of vertices.

We can perform the same trick for multigraphs and digraphs:

Here is the quiver V = {1,2,3,4}, = {(1,2),(1,3),(1,3),(1,4),(2,4),(2,4),(3,4),(4,4)}:

four-vertex quiver drawn roughly as a square

Here is its adjacency matrix:

    ⎛0 1 2 1⎞
A = ⎜0 0 0 2⎟
    ⎜0 0 0 1⎟
    ⎝0 0 0 1⎠

Here are the next two n-edge walk matrices:

     ⎛0 0 0 5⎞
A² = ⎜0 0 0 2⎟
     ⎜0 0 0 1⎟
     ⎝0 0 0 1⎠
     ⎛0 0 0 5⎞
A³ = ⎜0 0 0 2⎟
     ⎜0 0 0 1⎟
     ⎝0 0 0 1⎠

We can also add these matrices:

              ⎛0 1 2 11⎞
A + A² + A³ = ⎜0 0 0  6⎟
              ⎜0 0 0  3⎟
              ⎝0 0 0  3⎠

This matrix tells us that from 1 to 4 there are eleven walks of no more than three edges.

Adjacency matrices can give us a useful way to reason about graphs without having to traverse every walk.

Conclusion

These concepts give us the basic tools for working with graph databases. In future posts I will look at how we can put them to work to model domains.

Oboe concerto teaser: II

This is part 2 of a series on my forthcoming oboe concerto. This is a commission for the Handel Collection, and will be performed on 5 July 2011, 13:0014:00 at St Stephen Walbrook (warning: site plays music), 39 Walbrook, London, EC4N 8BN. Please come along if you can!

I mentioned last time that the Sarabande is built on a meta-canon. Here’s how it works.

This movement is 156 bars long which is equal to 12 × 13 or 2 × (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12). This second formula is salient, because it’s based on a triangular number, and I’m rather fond of triangular numbers.

The oboe and bassoon parts are each divided into 12 sections, each of a different length 2n where n is a number from 1 to 12. The oboe’s sections are ordered n = 12, 9, 6, 3, 2, 5, 8, 11, 10, 7, 4, 1, and those of the bassoon n = 2, 4, 6, 8, 10, 12, 11, 9, 7, 5, 3, 1, so they both similar types of pattern.

As the sections are of different lengths, the beginnings of sections rarely coincide in the two parts, and only once on the same length of section: n = 1 in the final 2 bars. Other coincidences of note are that oboe 2 and bassoon 12 start together (mirroring the beginning of the movement, where oboe 12 starts with bassoon 2), bassoon 11 and oboe 11 overlap by 16 bars, and bassoon 7 and oboe 7 by 6 bars.

This structure then forms the basis of the meta-canon: the material for each value of n is the same in both voices, wherever they occur. This means that in sections 11 and 7 the voices really are in canon, and in section 1 they are in rhythmic unison. All of the other sections are not self-contiguous, so the voices perform a canon-at-a-distance. In addition to this basic canonic principle, odd-numbered sections are inverted between oboe and bassoon, while even-numbered sections are performed recte.

There’s not really much in the way of musical example I can post until I’ve explained the derivation of the material (and finished writing it all!), so in the mean time, here’s a colourful illustration of the structure of the canon: