A taste of graph theory

Postcard of buildings by a river. Caption reads ‘Königsberg Schlossteich’

I’ve recently been working with graph databases, which give us a powerful idiom for modelling and reasoning with highly interconnected data. Before I share some of my experiences, I would like to set the scene with a basic introduction to graph theory.

Simple Graphs

Graph theory is a relatively young branch of mathematics, traced back to 1736 and the work of Leonhard Euler.

A simple graph is defined as a non-empty set of vertices, e.g. V = {1,2,3}, and a set of edges, each of which is a 2-member subset of the vertex set, e.g. E = {{1,2},{1,3},{2,3}}.

We can visualise a graph by drawing a diagram:

simple three-vertex graph drawn as a triangle

It’s tempting, particularly for the etymologically minded, to think of a graph as drawing. It’s certainly easier to think about graphs by visualising them, but drawings as representations of graphs, have the potential to be misleading, so we need to exercise caution.

For example, the same graph can be drawn like this:

simple three-vertex graph drawn with curved edges and four crossings

Notice how this diagram shows four crossings, but is in fact equal to the above graph, which shows none.

Because simple graphs are defined in terms of sets, we can note some key characteristics:

  • Each vertex only appears once. V = {1,1,2,3} is not a valid set because a=a.
  • An edge cannot join a vertex to itself. {1,1} is not a valid set, so it cannot be a valid element in the edge set.
  • There can only be one edge between two particular vertices. E = {{1,2},{1,2},{2,3},{1,3}} is not a valid set because {1,2} = {1,2}.
  • The edges have no direction. E = {{1,2},{2,1},{2,3},{1,3}} is not a valid set because {1,2} = {b,a}.

Useful variations

These restrictions are great for pure mathematics, but somewhat restrict our ability to model real-world situations. For this reason, a typical graph database relaxes the rules in various ways:

Digraphs

It introduces a notion of direction in the edges. This means we are now dealing with directed graphs or digraphs.

We can now model a graph V = {1,2,3}, E = {(1,2),(1,3),(2,3)}:

three-vertex digraph drawn as triangle

Multigraphs

We can allow skeins: multiple edges between vertices. If we replace any edge of a graph with a skein, then we have a multigraph, and our edge set becomes a multiset, as it may contain duplicated elements.

Here we expand the basic graph V = {1,2}, E = {{1,2}} by replacing the edge {1,2} with a skein of three edges, giving V = {1,2}, E = {{1,2},{1,2},{1,2}}:

two-vertex simple graph drawn as line

becomes

two-vertex multigraph with three edges

We can also model loops by allowing single-member sets as elements of EV = {1}, E = {{1}}:

one-vertex loop

Quivers

We can also allow directed multigraphs, also known as multidigraphs or quivers.

Degrees of vertices

The degree of a vertex, deg v, is the number of edges attached to it. In a simple graph deg v = |{e : e ∈ E, ve}|. Visually you can find the degree of a vertex by counting the edges that connect to it.

The indegree of a vertex, deg– v, is the number of edges leaving it, and the outdegree of a vertex, deg+ v, is the number of edges reaching it. If we model the directed edges as tuples, then deg v = |{e : e ∈ E, ∃v e = (v,y)}| and degv = |{e : e ∈ E, ∃x e = (x,v)}|. Visually you can find the indegree of a vertex by counting the arrows that leave it, and the outdegree by counting the arrows that point to it.

Walks

The power of graphs to model connected data arises when we start walking our graphs.

Mathematically, a walk is a sequence of vertices (v1,v… vn-1,vn) where each vertex vx is a member of the graph’s vertex set, and each pair of vertices (vx,vx+1) in the sequence is a member of the graph’s edge set.

Visually, a walk is found by placing a pencil on one of the dots on a graph diagram, and tracing along a line to another dot, then repeating the process.

When we come to look at graph databases, we will focus on ‘traversing’ them by walking along their edges.

Adjacency

As well as a diagram, we can represent a graph with an adjacency matrix. Consider the graph V = {1,2,3,4}, E = {{1,2},{1,3},{1,4},{2,3},{3,4}}. We can draw this graph like this:

four-vertex simple graph drawn as a square with one diagonal edge

We can also create an adjacency matrix where Aij is 1 if {i,j} ∈ E and 0 otherwise.

    ⎛0 1 1 1⎞
A = ⎜1 0 0 1⎟
    ⎜1 0 0 1⎟
    ⎝1 1 1 0⎠

The top row shows how many edges there are between 1 and each vertex: none to itself, one to 2, one to 3 and one to 4.

We can easily find the deg v by taking the sum of the corresponding row or column. We can see at a glance that the deg 1 = 3.

By multiplying an adjacency matrix by itself, we can find how many two-edge walks exist between any two vertices:

     ⎛0 1 1 1⎞   ⎛0 1 1 1⎞   ⎛3 1 1 2⎞
A² = ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ = ⎜1 2 2 1⎟
     ⎜1 0 0 1⎟   ⎜1 0 0 1⎟   ⎜1 2 2 1⎟
     ⎝1 1 1 0⎠   ⎝1 1 1 0⎠   ⎝2 1 1 3⎠

We can check that there are three two-edge walks between 1 and 1 {(1,2,1),(1,3,1),(1,4,1)}, one between 1 and 2 {(1,4,2)}, one between 1 and 3 {(1,4,3)} and two between 1 and 4 {(1,2,4),(1,3,4)}.

We can continue this trick for three-edge walks:

     ⎛0 1 1 1⎞   ⎛0 1 1 1⎞   ⎛0 1 1 1⎞   ⎛4 5 5 5⎞
A³ = ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ x ⎜1 0 0 1⎟ = ⎜5 2 2 5⎟
     ⎜1 0 0 1⎟   ⎜1 0 0 1⎟   ⎜1 0 0 1⎟   ⎜5 2 2 5⎟
     ⎝1 1 1 0⎠   ⎝1 1 1 0⎠   ⎝1 1 1 0⎠   ⎝5 5 5 4⎠

Again we can check that there are four three-edge walks between 1 and 1: {(1,2,4,1),(1,3,4,1),(1,4,2,1),(1,4,3,1)}, five between 1 and 2: {(1,2,1,2),(1,2,4,2),(1,3,1,2),(1,3,4,2),(1,4,1,2)}, five between 1 and 3: {(1,2,1,3),(1,2,4,3),(1,3,1,3),(1,3,4,3),(1,4,1,3)} and five between 1 and 4: {(1,2,1,4),(1,3,1,4),(1,4,1,4),(1,4,2,4),(1,4,3,4)}.

In general the matrix Aⁿ shows us how many n-edge walks there are between each pair of vertices.

We can perform the same trick for multigraphs and digraphs:

Here is the quiver V = {1,2,3,4}, = {(1,2),(1,3),(1,3),(1,4),(2,4),(2,4),(3,4),(4,4)}:

four-vertex quiver drawn roughly as a square

Here is its adjacency matrix:

    ⎛0 1 2 1⎞
A = ⎜0 0 0 2⎟
    ⎜0 0 0 1⎟
    ⎝0 0 0 1⎠

Here are the next two n-edge walk matrices:

     ⎛0 0 0 5⎞
A² = ⎜0 0 0 2⎟
     ⎜0 0 0 1⎟
     ⎝0 0 0 1⎠
     ⎛0 0 0 5⎞
A³ = ⎜0 0 0 2⎟
     ⎜0 0 0 1⎟
     ⎝0 0 0 1⎠

We can also add these matrices:

              ⎛0 1 2 11⎞
A + A² + A³ = ⎜0 0 0  6⎟
              ⎜0 0 0  3⎟
              ⎝0 0 0  3⎠

This matrix tells us that from 1 to 4 there are eleven walks of no more than three edges.

Adjacency matrices can give us a useful way to reason about graphs without having to traverse every walk.

Conclusion

These concepts give us the basic tools for working with graph databases. In future posts I will look at how we can put them to work to model domains.

On Creation

The Creation of Adam

I wrote recently about creative thinking, but my discussions with friends and colleagues often strayed beyond creative thinking to discuss creativity and creation in general. Here are some thoughts on the nature of creation*.

Creation defined

We can say:

An act is an act of creation if something exists after the act that did not exist before it, and would not exist if the act had not occurred.

Destruction defined

We can also define destruction:

An act is an act of destruction if something does not exist after the act that did exist before it, and would continue to exist if the act had not occurred.

Equivalence of creation and destruction

We can notice how creation sometimes involves taking something away:

  • Digging a hole is an act of creation, just as making a pile of earth.
  • Carving a bowl from a block of wood is an act of creation, just as forming a bowl as a coil pot.
  • Resist printing uses wax to prevent dye taking in certain areas. Discharge printing uses bleach to remove colour from previously dyed fabrics. Applying either of these is an act of creation, just as direct printing, where the pattern comes from the application of the dye.

If we define the notion of absence:

There is an absence of something if that thing does not exist. If that thing exists, there is no absence.

Then we can draw an equivalence between the two, as a act of destruction creates an absence, and an act of creation destroys an absence.

Entropic bias

If acts of creation and destruction are logically equivalent, why do we tend to make a distinction? Why do we focus on the creative aspect of bowl carving, rather than its destructive aspect?

We seem have a bias to see acts that decrease entropy as creative, and acts that increase entropy as destructive.

This bias goes back far beyond the formulation of the Second Law of Thermodynamics. Here, for example, is the beginning of the Judaeo-Christian creation story:

בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ. וְהָאָרֶץ הָיְתָה תֹהוּ וָבֹהוּ וְחֹשֶׁךְ עַל פְּנֵי תְהוֹם, וְרוּחַ אֱלֹהִים מְרַחֶפֶת עַל פְּנֵי הַמָּיִם. וַיֹּאמֶר אֱלֹהִים: “יְהִי אוֹר”, וַיְהִי אוֹר.

In the beginning God created the heaven and the earth. And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. And God said, Let there be light: and there was light.

That phrase ‘תֹהוּ וָבֹהוּ’, ‘ṯōhū wāḇōhū’ describes a high entropy system (‘תֹהוּ’ translates as ‘waste, emptiness, vanity’, ‘בֹהוּ’ is an emphatic reduplication) and the next six days’ action is the process of decreasing its entropy.

(It’s interesting to note how the creation of order entails the reduction of entropy. The very notion of entropy inverts our bias towards order.)

Creation and Creativity

We’ve characterised acts of creation and observed their logical, if not psychological equivalence. Are all acts of creation creative acts?

It seems to me that we can place acts of creation on a continuum, from those that we would seldom characterise as ‘creative’ to those we would unhesitatingly characterise that way.

If we revisit our earlier examples, then it’s hard to see how digging a hole could be considered to be a creative act, although it is indeed an act of creation. Carving a bowl and textile printing seem to be further along the scale, though I suspect most of us would choose our words depending on whether we saw this act of creation as something purely mechanistic, or as bringing something additional to the act.

It seems to me that this something additional may be what we look for when we distinguish a creative act from a simple act of creation. I wonder whether this is an act of conceptual creation, a reduction in conceptual entropy, and whether this is what we mean when we speak of creative thinking. This is something for me to explore further.


* I was a hopeless philosophy student, so these thoughts come from a position of wide-eyed ignorance. I’m confident my arguments are both invalid and unsound, and that whatever value I have touched on here has been expressed with more precision and insight elsewhere. If you notice my ignorance, please tell me, so I can be wiser.

Nevertheless, there is a certain pleasure in casting a wide net over my synapses and displaying the catch, whatever worth it may have.

Creativity in Software Development

I shared yesterday’s post with some friends, who were keen to explore what we mean when we talk about creativity in software development.

Alastair made an interesting comment:

…it made me reconsider software dev as a creative endeavour, but I think I came to the conclusion that it is. For me, I think there is a gap between a creative art like writing, especially one which has an expressive mirror like acting, and a purely creative activity like, e.g., whittling a stick or constructing a building.

I think there is value in disentangling our concepts of creativity, and I find Alastair’s distinction between the creative arts and simpler forms of creation very useful.

There’s also an ambiguity in the word ‘create’, as it can refer simply to making things, as well as to the creative endeavours we would like to characterise.

So rather than ask ‘Is software development a creative activity?’, I tend to consider a narrower question: ‘Is there a place for creative thinking in software development?’

As the most basic level, I see creative thinking as making new links between concepts. Once you have made the link, you can engage other thought processes, for example deductive thinking, to explore the consequences and implications of that link.

But because the link isn’t already there, you can’t find it by rational thought; you need a leap of imagination to reach it.

There are some sorts of problem that I can tackle best once I’ve slept. On a few lucky occasions I’ve been able to take an afternoon nap, and woken up with a new idea to investigate, but this usually means taking the idea home with me and letting it brew overnight.

Here are a few examples of problems in software development that can be tackled with creative thinking:

  • How should we name this element?
  • What is the appropriate metaphor for this system?
  • Has a similar problem already been solved? Is there a pattern we can apply here?
  • What test should we write first? What test should we write next?
  • What is the best way to split this system into smaller parts?

And of course, because software development in an organisation is a social activity, the need for creative thinking extends far beyond the design of the software.

Bring your Shadow to Work

Handrail and shadow

There‘s an idea in certain circles that we should be able to bring our whole selves to work.

There are aspects of this notion that I find unproblematically wonderful. For those of us who are invisible members of minority groups, the ability to drop the mask and be open about our identities can help us find a level of safety and inclusion at work.

There are areas that I find problematic, particularly questions about how permeable the work/life barrier should be. These are questions for another day.

The point I’m thinking about today is that there are aspects of all of us that are unpleasant. We think dark thoughts and entertain transgressive fantasies. These are parts of our whole self, but do we really want to bring them to work?

Carl Jung characterises this aspect of us as our Shadow, and sees our encounters with our Shadow as part of the way we Individuate ourselves. Repression and denial of the Shadow can lead to dysfunction: we can become overwhelmed by it and start Acting Out our fantasies, rather than enjoying them in the privacy of our mind.

So I find a tension here: being a psychologically healthy person requires us to have a healthy relationship with the dark areas of our minds, to admit these areas into our whole selves. But if we are to bring our whole selves to work, then we need to bring these dark areas to work as well.

This question becomes more pressing if we accept that there is a close relationship between our Shadow and our creativity. If we hope to do creative work, then we need to be able to dip our bucket into this dark well.

In this context, I read something interesting in an interview with Phoebe Waller-Bridge:

More than anything, she says, as a writer she wants to show women indulging their appetites and venting their grievances. “We sexualise women all the time in drama and TV. They are objectified. But an exploration of one woman’s creative desire is really exciting. She can be a nice person, but the darker corners of her mind are unusual and fucked up, because everyone’s are.” Has she always been able to say the unsayable? “Yes. As long as it feels truthful, as long as it’s pointing at the elephant, it is always exciting.” [Emphasis mine]

And this got me thinking again: Waller-Bridge is making quite a name for herself by bringing the darker corners of the mind, the Shadow, into her work. As a screenwriter this may be rather more straightforward than for a software developer, for example. But is there a way we can openly and honestly bring these aspects of ourselves to work? Is a truly psychologically safe workplace one where we can invite our Shadows?

Test code needn’t be defensive

In a code review I encountered some test code that looked a bit like this:

 var result = await _controller.Resources() as ViewResult;
 result.Should().NotBeNull();
 // ReSharper disable once PossibleNullReferenceException
 result.Model.Should.BlahBlahBlah();
 

This is a typical defensive coding pattern whose reasoning goes like this:

  • The return type of _controller.Resources() is Task<ActionResult>.
  • I need to cast the inner Result of this Task to a ViewResult, as I want to inspect its Model attribute.
  • But the Result could be a different subclass of ActionResult, so I had better use a safe cast, just in case.
  • As I’m using a safe cast, I can’t guarantee that I’ll get any instance back, so I had better do a null check.
  • Oh look! ReSharper is complaining when I try to access properties of this object. As I’ve already performed a null check, I’ll turn off the warnings with a comment.

Now, defensive coding styles are valuable when we don’t know what data we’ll be handling, but this is most likely to happen at the boundaries of a system, where it interacts with other systems or, even more importantly, humans.

But in the context of a unit test, things are different:

  • We are in control of both sides of the contract: the test and class under test have an intimate and interdependent existence. A different type of response would be unexpected and invalid.
  • An attempt to directly cast to an invalid type will throw a runtime error, and a runtime error is a meaningful event within a test. If _controller.Resources() returns any other subclass of ActionResult, then the fact that it cannot be cast to ViewResult is the very information I want to receive, as it tells me how my code is defective.

This means I can rewrite the code like this:

 
var result = (ViewResult) await _controller.Resources(); 
result.Model.Should.BlahBlahBlah();

By setting aside the defensive idiom, I’ve made the test clearer and more precise, without losing any of its value.

Identical random values in parallel NUnit test assemblies

We have some NUnit cross-system test assemblies that run in parallel. They upload files and watch to see how they are processed. We were inserting a randomly generated value into the filenames in an attempt to avoid identically named files overwriting each other.

Unfortunately, this didn’t work, and we saw frequent test failures. In particular, we found that filenames generated by one test assembly were often identical to those generated by another.

I wanted to understand why this was happening, so I did some investigation.

We were getting our values fromTestContext.CurrentContext.Random, which is an instance of NUnit’s Randomizer class.

When we look at the implementation of Randomizer, we see this:

static Randomizer()
{
    InitialSeed = new Random().Next();
    Randomizers = new Dictionary<MemberInfo, Randomizer>();
}

The Randomizer is statically seeded with a value generated by the System.Random class. Because this seed is static, the same sequence of random values is shared by all references to Randomizer within each assembly, producing a sequence of values that are highly likely to be different from each other. However, references to Randomizer in a concurrently running assembly use a different instance and have their own sequence of values with its own seed.

Let’s have a quick look at how System.Random is seeded.

The MSDN documentation tells us:

  • The Random() constructor uses the system clock to provide a seed value. This is the most common way of instantiating the random number generator.

And goes on to warn:

…because the clock has finite resolution, using the parameterless constructor to create different Random objects in close succession creates random number generators that produce identical sequences of random numbers.

It would seem that our two parallel test assemblies often start executing within such a short interval of each other, and that NUnit’s Randomizer is seeded with the same value in each assembly, which means the sequences of values are identical.

There is some discussion of introducing a mechanism for controlling the seeding of Randomizer in NUnit, but in the mean time, the solution to our problem was to seed our own System.Random instances, rather than relying on NUnit’s.