I’ve recently been working with graph databases, which give us a powerful idiom for modelling and reasoning with highly interconnected data. Before I share some of my experiences, I would like to set the scene with a basic introduction to graph theory.

## Simple Graphs

Graph theory is a relatively young branch of mathematics, traced back to 1736 and the work of Leonhard Euler.

A simple graph is defined as a non-empty set of vertices, e.g. *V* = {*1*,*2*,*3*}, and a set of edges, each of which is a 2-member subset of the vertex set, e.g. *E* = {{*1*,*2*},{*1*,*3*},{*2*,*3*}}.

We can visualise a graph by drawing a diagram:

It’s tempting, particularly for the etymologically minded, to think of a graph as drawing. It’s certainly easier to think about graphs by visualising them, but drawings as representations of graphs, have the potential to be misleading, so we need to exercise caution.

For example, the same graph can be drawn like this:

Notice how this diagram shows four crossings, but is in fact equal to the above graph, which shows none.

Because simple graphs are defined in terms of sets, we can note some key characteristics:

- Each vertex only appears once.
*V* = {*1*,*1*,*2*,*3*} is not a valid set because a=a.
- An edge cannot join a vertex to itself. {
*1*,*1*} is not a valid set, so it cannot be a valid element in the edge set.
- There can only be one edge between two particular vertices.
*E* = {{*1*,*2*},{*1*,*2*},{*2*,*3*},{*1*,*3*}} is not a valid set because {*1*,*2*} = {*1*,*2*}.
- The edges have no direction.
*E* = {{*1*,*2*},{*2*,*1*},{*2*,*3*},{*1*,*3*}} is not a valid set because {*1*,*2*} = {*b,a*}.

## Useful variations

These restrictions are great for pure mathematics, but somewhat restrict our ability to model real-world situations. For this reason, a typical graph database relaxes the rules in various ways:

### Digraphs

It introduces a notion of direction in the edges. This means we are now dealing with *directed graphs* or *digraphs*.

We can now model a graph *V* = {*1*,*2*,*3*}, *E* = {(*1*,*2*),(*1*,*3*),(*2*,*3*)}:

### Multigraphs

We can allow **skeins**: multiple edges between vertices. If we replace any edge of a graph with a skein, then we have a **multigraph**, and our edge set becomes a **multiset**, as it may contain duplicated elements.

Here we expand the basic graph *V* = {*1*,*2*}, *E* = {{*1*,*2*}} by replacing the edge {*1*,*2*} with a skein of three edges, giving *V* = {*1*,*2*}, *E* = {{*1*,*2*},{*1*,*2*},{*1*,*2*}}:

becomes

We can also model loops by allowing single-member sets as elements of *E*, *V* = {*1*}, *E = *{{*1*}}:

### Quivers

We can also allow directed multigraphs, also known as multidigraphs or quivers.

## Degrees of vertices

The **degree** of a vertex, deg *v*, is the number of edges attached to it. In a simple graph deg *v* = |{*e* : *e* ∈ *E,* *v* ∈ *e*}|. Visually you can find the degree of a vertex by counting the edges that connect to it.

The **indegree** of a vertex, deg^{– }*v*, is the number of edges leaving it, and the **outdegree** of a vertex, deg^{+} *v*, is the number of edges reaching it. If we model the directed edges as tuples, then deg^{–} *v* = |{*e* : *e* ∈ *E,* ∃*v e* = (*v*,*y*)}| and deg^{+ }*v* = |{*e* : *e* ∈ *E,* ∃*x e =* (*x*,*v*)}|. Visually you can find the indegree of a vertex by counting the arrows that leave it, and the outdegree by counting the arrows that point to it.

## Walks

The power of graphs to model connected data arises when we start *walking* our graphs.

Mathematically, a walk is a sequence of vertices (*v*_{1},*v*_{2 }… *v*_{n-1},*v*_{n}) where each vertex *v*_{x} is a member of the graph’s vertex set, and each pair of vertices (*v*_{x},*v*_{x+1}) in the sequence is a member of the graph’s edge set.

Visually, a walk is found by placing a pencil on one of the dots on a graph diagram, and tracing along a line to another dot, then repeating the process.

When we come to look at graph databases, we will focus on ‘traversing’ them by walking along their edges.

## Adjacency

As well as a diagram, we can represent a graph with an adjacency matrix. Consider the graph *V* = {*1*,*2*,*3*,*4*}, E = {{*1*,*2*},{*1*,*3*},{*1*,*4*},{*2*,*3*},{*3*,*4*}}. We can draw this graph like this:

We can also create an adjacency matrix where *A*_{ij} is 1 if {*i*,*j*} ∈ *E* and 0 otherwise.

⎛0 1 1 1⎞
A = ⎜1 0 0 1⎟
⎜1 0 0 1⎟
⎝1 1 1 0⎠

The top row shows how many edges there are between *1* and each vertex: none to itself, one to *2*, one to *3* and one to *4*.

We can easily find the deg *v* by taking the sum of the corresponding row or column. We can see at a glance that the deg *1* = 3.

By multiplying an adjacency matrix by itself, we can find how many two-edge walks exist between any two vertices:

⎛0 1 1 1⎞ ⎛0 1 1 1⎞ ⎛3 1 1 2⎞
A² = ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ = ⎜1 2 2 1⎟
⎜1 0 0 1⎟ ⎜1 0 0 1⎟ ⎜1 2 2 1⎟
⎝1 1 1 0⎠ ⎝1 1 1 0⎠ ⎝2 1 1 3⎠

We can check that there are three two-edge walks between *1* and *1* {(*1*,*2*,*1*),(*1*,*3*,*1*),(*1*,*4*,*1*)}, one between *1* and *2* {(*1*,*4*,*2*)}, one between *1* and *3* {(*1*,*4*,*3*)} and two between *1* and *4* {(*1*,*2*,*4*),(*1*,*3*,*4*)}.

We can continue this trick for three-edge walks:

⎛0 1 1 1⎞ ⎛0 1 1 1⎞ ⎛0 1 1 1⎞ ⎛4 5 5 5⎞
A³ = ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ = ⎜5 2 2 5⎟
⎜1 0 0 1⎟ ⎜1 0 0 1⎟ ⎜1 0 0 1⎟ ⎜5 2 2 5⎟
⎝1 1 1 0⎠ ⎝1 1 1 0⎠ ⎝1 1 1 0⎠ ⎝5 5 5 4⎠

Again we can check that there are four three-edge walks between *1* and *1*: {(*1*,*2*,*4*,*1*),(*1*,*3*,*4*,*1*),(*1*,*4*,*2*,*1*),(*1*,*4*,*3*,*1*)}, five between *1* and *2*: {(*1*,*2*,*1*,*2*),(*1*,*2*,*4*,*2*),(*1*,*3*,*1*,*2*),(*1*,*3*,*4*,*2*),(*1*,*4*,*1*,*2*)}, five between *1* and *3*: {(*1*,*2*,*1*,*3*),(*1*,*2*,*4*,*3*),(*1*,*3*,*1*,*3*),(*1*,*3*,*4*,*3*),(*1*,*4*,*1*,*3*)} and five between *1* and *4*: {(*1*,*2*,*1*,*4*),(*1*,*3*,*1*,*4*),(*1*,*4*,*1*,*4*),(*1*,*4*,*2*,*4*),(*1*,*4*,*3*,*4*)}.

In general the matrix *Aⁿ* shows us how many n-edge walks there are between each pair of vertices.

We can perform the same trick for multigraphs and digraphs:

Here is the quiver *V* = {*1*,*2*,*3*,*4*}, *E *= {(*1*,*2*),(*1*,*3*),(*1*,*3*),(*1*,*4*),(*2*,*4*),(*2*,*4*),(*3*,*4*),(*4*,*4*)}:

Here is its adjacency matrix:

⎛0 1 2 1⎞
A = ⎜0 0 0 2⎟
⎜0 0 0 1⎟
⎝0 0 0 1⎠

Here are the next two *n*-edge walk matrices:

⎛0 0 0 5⎞
A² = ⎜0 0 0 2⎟
⎜0 0 0 1⎟
⎝0 0 0 1⎠

⎛0 0 0 5⎞
A³ = ⎜0 0 0 2⎟
⎜0 0 0 1⎟
⎝0 0 0 1⎠

We can also add these matrices:

⎛0 1 2 11⎞
A + A² + A³ = ⎜0 0 0 6⎟
⎜0 0 0 3⎟
⎝0 0 0 3⎠

This matrix tells us that from *1* to *4* there are eleven walks of no more than three edges.

Adjacency matrices can give us a useful way to reason about graphs without having to traverse every walk.

## Conclusion

These concepts give us the basic tools for working with graph databases. In future posts I will look at how we can put them to work to model domains.