Reinvigorating a daily stand-up by walking the board

I’ve been working with a team who had a problem with focus: when I joined them, they seemed to be busy all the time, but they were frustrated that they weren’t making progress towards their sprint goals.

This situation could be seen in microcosm at the daily stand-up meeting. In this post I’m going to describe how a simple adjustment to this meeting helped us start to improve focus, morale and productivity.

The Scrum Guide gives a template for the daily stand-up meeting (which it calls the Daily Scrum):

The Daily Scrum is held at the same time and place each day to reduce complexity. During the meeting, the Development Team members explain:

  • What did I do yesterday that helped the Development Team meet the Sprint Goal?
  • What will I do today to help the Development Team meet the Sprint Goal?
  • Do I see any impediment that prevents me or the Development Team from meeting the Sprint Goal?

My team was indeed practising this technique, but it seemed that they often forgot the discipline of focusing on ‘helping the Development Team meet the Sprint Goal’, and the meeting often descended into yesterday-I-diddery, with each team member recounting all the things they did the day before in trivial detail.

It seems to me that in a stand-up of this format, each team member’s incentive becomes having something to say, rather than showing progress towards the Sprint Goal, and this produces an incentive to be busy, no matter how irrelevant or frankly counterproductive the tasks might be. In this team, I saw a lot of effort spent on support tasks—whether or not the issue was pressing—, a significant amount of aimless ‘refactoring’, which was essentially yak shaving, and a tendency for team members to interrupt each other for help with lower priority work. In effect, everyone starts prioritising busy work, rather than focusing on the team’s goals.

The other consequence of this approach was that the team’s board was a poor representation of our work: people would be working on tasks that weren’t visible on the board, and the stories that were on the board often didn’t move anywhere. The Scrum Master and I tried various approaches to coordinate the board and the stand-up reports, but the focus was still lacking.

I’ve previously worked in a Kanban environment, and the format of a Kanban standup is significantly different:

Standup meetings have evolved differently with Kanban. The need to go around the room and ask the three questions is obviated by the card wall. … The focus is on flow of work. The facilitator … will “walk the board.” The convention has developed work backward—from right to left (in the direction of pull)—through the tickets on the board. The facilitator might solicit a status update on a ticket or simply ask if there is any additional information that is not on the board and may not be known to the team.

(Anderson, David J. Kanban, Successful evolutionary change for your technology business)

I suggested that we try this approach for a week, and see whether it helped give us more focus. As people were concerned that we might lose sight of important work, we agreed that we would walk the board first, and then run quickly round the team to see if anything was missing.

The initial results were encouraging, and several weeks later we are still walking the board, rather than going round the team. In particular, our board now contains a great deal more information on the tasks in play, and the team have got really good at carding up even small tasks so they are visible at the next stand-up. The amount of off-plan and busy work has also dropped, and this also be a result of the focus on the tasks on the board. Perhaps my favourite development is that the tasks on the board have become much smaller: the drive to get things done is now focused on pulling small, focused tasks across the board, rather than doing busy work.

Of course this technique is no cure-all, and it took a while for the team to acquire the discipline of walking the board in order, rather than jumping in to discuss whichever task particularly excites them. However, as an incremental adaptation, I’m very pleased with its results.

A Retrospective in the Park

The other day, I facilitated a sprint retrospective in the park. The sun was shining, and we had all been working hard to complete our backlog, so it felt like a nice reward for everyone’s efforts. Holding a retrospective outdoors can also give it an energy and sense of enthusiasm that is harder to find in a small room.

I’ve run outdoor retrospectives before, and have previously followed fairly classic plans, with much arranging of index cards. This has never been a great success, as the slightest breath of a breeze can make a mess of your planning. For this retrospective, I designed a plan to avoid these problems, drawing some ideas from the Appreciative Retrospective plan.

This retrospective took an hour for a team of nine. You’ll need a pile of index cards or sticky notes, and a pen per person. Here’s how to do it:

1. Choose your location

Some people are sun lovers, whilst others, like me, burn easily and need some shade, so find a location that will work for everyone. Don’t worry too much about the state of the grass, as I suggest you conduct the retrospective standing up, if possible.

Get everyone to stand in a circle, with enough personal space for everyone, but close enough that you can hear everyone speak.

Observations: It’s nice if you can find a location where there won’t be too many distractions. We weren’t entirely successful: there was a hen party in another corner of the park, whose popping of prosecco corks and parading of an anatomically exaggerated blow-up mannequin was hard to ignore; there was also a group of male models sunning themselves noisily behind us (I’m working at a fashion company, so this is less unusual than it may sound), and three young women were smoking some interesting cigarettes upwind of us. Nonetheless, our retrospective was a success despite occasional distractions.

2. Characterise the sprint

Ask everyone to spend a couple of minutes coming up with three words to characterise the sprint; then go clockwise round the circle and ask each person to tell the team their three words.

Observations: It’s surprising how much difficulty people have sticking to three words; the important focus of this task is not the three-word limit, but getting a concise summary of the sprint.

3. Thank your neighbour

Moving anticlockwise around the team, ask each team member to thank their neighbour for something they have done during this sprint.

Observations: Our Scrum Master broke the rules by thanking the whole team for their efforts. The focus of this task is to generate a positive mood across the team, and it’s important that no one misses out on individual thanks, so I asked him to thank his neighbour for something as well.

4. Describe what went well

Hand each person three cards, and give them three minutes to write down three things that went well during the sprint. Going clockwise round the circle from a different starting point, ask each person to read out their three successes.

Observations: It’s useful for the facilitator to observe and comment on common themes, as this can help reinforce good practice.

5. Describe what could improve

Hand each person three more cards, and give them another three minutes to write down three things that could have gone even better during the sprint. Then go anticlockwise round the circle and ask each person to read out their three improvements.

6. Group the improvements

Instead of arranging the cards on a whiteboard (which isn’t practical in the park), appoint a champion for each improvement. Ask the first person to choose one of the improvements they suggested, and then get everyone else to hand this person any cards that describe a similar improvement. Keep running round the team until each team member has just one group of cards.

Observations: There’s a chance that you’ll end up with more themes than team members, in which case you’ll have to make a decision to drop some of these themes; in our case we had fewer common themes than team members, so we didn’t have to do this.

7. Select and discuss the most common themes

Rather than dot-voting, which again is impractical in the park, select the commonest themes. Ask everyone with any cards to take a step into the circle. Then ask everyone with just one card to take a step out again; then everyone with just two cards; then three, and so on until just three people are left in the inner circle.

Then have a three-minute discussion of each of these suggested improvements, with the focus of identifying at least one action per theme for the next sprint. Ensure someone is assigned to each action.

Observations: We ran slightly over the three minutes assigned to each theme, but this wasn’t a problem; if we hadn’t had a time limit, I suspect the conversation would have been much less focused.

8. Round off the retrospective

Finally, going round the circle clockwise, ask everyone to describe how they felt about the retrospective itself.

Observations: The feedback was very positive. The team had clearly enjoyed the opportunity to get out of the office, and they felt that the session had been successful: everyone was engaged and we came up with some good actions.

Feature Triggers

When we’re continuously integrating code changes, we may want to avoid prematurely exposing new behaviour to our users. A popular technique is to use Feature Toggles to turn a feature on based on a configuration setting. These are certainly useful, and are great for gradually rolling features out, but they involve introducing code into our systems that is not relevant to the system itself, and if poorly implemented can lead to us bloating our codebases with vapourware.

There is another technique that I think deserves more attention; I refer to this as Feature Triggers. The idea is to use an intrinsic part of your system to enable the new feature, rather than an extrinsic piece of configuration. The discipline lies in identifying the final change that will trigger the feature.

In his write-up of Feature Toggles, Martin Fowler describes a version of this technique that focuses on UI changes:

Release toggles are a useful technique and lots of teams use them. However they should be your last choice when you’re dealing with putting features into production.

Your first choice should be to break the feature down so you can safely introduce parts of the feature into the product. The advantages of doing this are the same ones as any strategy based on small, frequent releases. You reduce the risk of things going wrong and you get valuable feedback on how users actually use the feature that will improve the enhancements you make later.

If you really must hide a partly built feature, then the best way is to build all of it save the UI entry point and add that UI in a single release cycle. This way the non-ui code is fully integrated with everything else, but nothing is visible or used until you add the last bit at the end.

(Emphasis mine)

However, we needn’t restrict this technique to the UI. In particular, if you are practising Feature Pull, then the trigger should be the last piece of work in the chain, rather than necessarily a UI change.

Let me give an example.

I worked on a system that sold recorded music on line, and we wanted to start selling high-resolution tracks. The system had theretofore only supported a single audio format per track, usually 320 bps MP3, so the UI simply presented the customer with a ‘Add to basket’ button. Now we wanted to present them with a choice of, say, MP3 or FLAC, at different prices.

Our feature flow worked something like this:

  • When a customer has a track as a FLAC in their library, then they can download and listen to it with a FLAC codec.
  • When a customer has a track as a FLAC in their shopping basket, and they check out, it ends up in their library.
  • When a customer on the website clicks to add a track to their basket as a FLAC, then the track ends up in their shopping basket as a FLAC.
  • When a track is in the catalogue with a FLAC format, then the website shows that it can be added to basket as a FLAC.
  • When we have negotiated FLAC rights for a track that is ingested, then that track ends up in the catalogue with a FLAC format.

Now, each of these steps could be implemented in turn, using the principle of fake it till you make it to drive out the interface and schema changes as we went along. Crucially, we maintained existing behaviour for existing data. In the case of the UI, this meant that when a track was available in a single format, we just showed an ‘add to basket’ button with format-agnostic behaviour, whereas a track with multiple formats triggered a slightly more complex UI whose buttons had content-specific behaviour. Notice that in this case the UI was not the last piece of the puzzle: it’s behaviour was triggered by upstream data.

Instead, the entirety of this new feature was triggered by enabling the new ingestion rules—which then enabled multiple formats in the catalogue—which then enabled multiple formats on the website—which then enabled tracks to be added to the basket with specified formats—which then enabled format-aware tracks in the library—which then enabled format-aware downloads. All through this we limited the vapidity of the feature by keeping the code branching to a minimum and by releasing continuously, which meant that new code was almost immediately in use in production.

I’m sure this technique is widespread, and I think it’s worth reminding ourselves that the trigger needn’t be in the UI: by combining this technique with Feature Pull, we can often do without feature toggles and write more focused code.

On Pull, Entropy and Bottlenecks

A common scenario

Teams tend to work on features from the entry point in, pushing the behaviour through the system. In my experience this is almost always a bad idea, and I’m going to explain why, and offer some alternative approaches.

Let’s consider a fictional business: Sangeeta’s Sheet Music.

Sangeeta has focussed hitherto on selling piano music through her website and Android and iOS apps, but she’s now keen to corner the string quartet market. Now, string quartet music is sold in both score format and as parts, so there will be a certain amount of work to store information on available stock and to allow customers to choose which format they want to purchase.

Like in many successful startups, Sangeeta’s tech team has grown to some fifteen nmembers, who have been split into Catalogue, Payment, Frontend and Apps teams. She sits down with representatives from these teams, and they agree that Charlie and his team will start work on storing available formats in the Catalogue.

Two-and-a-half months later, Charlie’s team have updated the database and API schemata, and Penny and her team have enabled the shopping basket and checkout process to process different formats at different price points. At this point they approach Fatimeh from Frontend and Alejandro from Apps: “We’re nearly ready to go live with our new string quartet catalogue: we’ve got the data schema in place, our stock room is filling up with catalogue, so we just need to you to update the front end and apps, and we’re ready to start selling!”

Alejandro and his team are busy fixing a long list of bugs from the new iPhone, and won’t be able to start work for at least six weeks, but fortunately Fatimeh’s team have some capacity, so they start work. It soon becomes clear that some of the initial assumptions don’t work out on in real life: use case research shows that many orders are likely to be for all four string parts, and the shopping basket API doesn’t support this easily. Furthermore, they quickly realise that some of the catalogue doesn’t fit into the expected model: Schönberg’s String Quartet №2 has a part for soprano voice, and George Crumb’s *Black Angels* is only available in score format.

After some tedious toing and froing between the Catalogue, Payment and Frontend teams, they are happy with the functionality, but Alejandro and his team still aren’t ready implement this functionality in the apps, and Sangeeta doesn’t want to launch functionality on just the website. The feature has now been in progress for four months, yet it’s still vapourware, and the stock is sitting unsold in the stockroom.

On the face of it, there are two problems here: cross-team coordination and poorly specified behaviour. I contend that these are both results of a deeper problem: starting work in the wrong place.

Another approach

At 7digital I worked with Neil Kidd on a feature rather like this. We had observed how frequently these two problems arise, and decided we needed a different approach: we would start from the exit point and work backwards.

To do this we scripted some end-to-end User Journeys to describe the expected behaviours, stubbing out interim steps. We then worked with the responsible team to implement the final step in the process against canned data. We were then able to work backwards, working on each component in the chain to enable it to provide real rather than canned data. At each step in the process, we of course ensured that existing behaviour was unchanged, and that the new behaviour only kicked in when the previous component was ready to support it.

By starting with end-to-end User Journeys and considering the data output, we avoided nasty surprises, and our by triggering the new behaviour with data, rather than feature switches, we enabled the functionality to go live as soon as all the pieces were in place, which helps avoid producing vapourware.

I refer to this approach as Feature Pull, acknowledging the Kanban approach of pulling inventory through a manufacturing process, or work-in-progress through a development process, and I’ve been fairly vocal in encouraging colleagues to embrace it.

This concept tallies well with BDD and TDD principles, as it favours thinking about user journeys as an initial step, and testing from the desired outcome backwards.

Entropy

I wanted to find a way to illustrate this principle to colleagues outside the context of a feature, and felt a game might be the answer. I attempted to run such a game at 7digital, but the clearest lesson we drew from it was the benefit of communication, rather than anything specific to the flow of feature work.

And so at Socrates BE I ran a session in which I asked for help in coming up with a such a game.

As it happens, the interesting part of this discussion was not the proposals for the game itself, but the last few minutes, in which we started to examine *why* a pull system is often most appropriate. We alighted upon the notion of Entropy and the idea that features in a system should flow from the point of least to the point of most entropy.

If we consider that interactions with a system can often be modelled as a decision tree that branches over time, it is not surprising that the point of most entropy is at the end of a user journey, and it makes sense to start at that point.

Theory of Constraints

This felt like a nice take-away point, but I felt there was something missing. This morning I was reflecting on these ideas again, and the answer came to me: the point of most entropy is the bottleneck in the system, and by starting at this point we are subordinating our work to the bottleneck.

Think back to Sangeeta’s Sheet Music. The most obvious bottleneck in this system is clearly Alejandro and the Apps team, as they have so much other work to do. It therefore makes sense to start work by subordinating to this bottleneck, not building up inventory by implementing features that depend on this team, but rather finding ways to enable this team to overcome its workload problems.

Once the Apps team bottleneck has been tackled, then another bottleneck appears, this time one of planning: the team needs to understand and implement all the scenarios that are needed before the minimum viable feature can go live. This may involve a fair amount of planning and conceptual work, but again there is no point in doing upstream development work until this has been tackled.

Three metaphors

So here we have three metaphors for understanding how to approach features. If you like, you can focus on Pull, and start at the exit point of your system; if your mind turns to theoretical physics, then consider the entropy of your system, and start at the point where it is greatest; and if the Theory of Constraints gets you excited, then look for the bottleneck and subordinate to it. But whatever you do, please don’t just start coding!

Experience report: running the 7digital Technical Academy

Between September 2014 and April 2015 I ran 7digital’s Technical Academy: a half-year training scheme to give a new cohort of software developers a grounding in software craft. Here are a few thoughts on how it went.

Choosing candidates

Our four participants came from diverse backgrounds: two were graduate recruits, one with a degree in Forensic Biology, the other in Physics. One internal participant was a member of our QA team, while the other joined us from the Client Relations team. Three of them were women.

We explicitly opened the external vacancies to candidates with non-Computer Science backgrounds, and this gave us access to a more diverse field of candidates. We wanted candidates who had shown an interest in software development, but didn’t demand any industry experience: programming a spreadsheet to analyse coursework data, working with Access databases in a summer job, or creating a personal website could all be evidence of this interest.

At 7digital, the technical test is to pair programme on a software kata. The usual aim is to assess the candidate’s aptitude for TDD, and to see how well they work with another person. We used the same test for our Technical Academy candidates, but were fully aware that they were unlikely to have any TDD experience. Instead, we used this as an opportunity to find out how quickly they could pick up these techniques. At the beginning of the 45-minute exercise, I took a fairly directive role as Navigator: explaining what a unit test was, how an assertion worked, and telling the candidate what to type. By the end of the session, our successful candidates had demonstrated that they had picked up all the key concepts, and were capable of writing new tests with much less direction. This aptitude for learning was the most important quality we sought in our candidates.

Joining a team

Each participant joined one of the development teams, where they were expected to pair with a more experienced developer from the beginning. In addition, we ran several learning sessions a week, and expected them to work on a personal project over the six months of the academy. We suggested that they divide their time between academy and team work roughly 50:50.

New recruits at 7digital are assigned a Mentor, who has a pastoral role, catching up with them frequently and helping them settle in. We arranged Mentors for our two new joiners, and all four participants were also assigned a Tutor, whose role was to help with with any immediate technical issues during the academy, particularly in the project work.

Learning sessions

We organised a range of learning sessions, and implemented a pull system, choosing topics to address questions that had recently arisen during the participants’ project and team work. We met every Monday morning to finalise that week’s sessions, and to start planning for the following week. Apart from keeping a list of potential topics, we didn’t plan further than two weeks ahead.

Some of the sessions were hands-on practical exercises. At the beginning of the academy, we got the participants into the habit of TDD by running mob programming kata sessions: I would sit with the four of them around a computer, and they would pass the keyboard around in 5 minute sessions. As questions arose, we would sit back from the keyboard to discuss them. I introduced programming concepts and features of the IDE (Visual Studio with ReSharper) as we went along, so they could become familiar with running tests, elemental refactoring shortcuts and so forth.

Some of the sessions were more theoretical, and we enlisted colleagues to give talks and demonstrations of topics where they had particular expertise. Initially I took responsibility for organising this, but as our participants’ confidence and familiarity with their colleagues grew, I handed responsibility for this over to them.

Near the end of the course we ran some presentation sessions: one on the SOLID Principles and a couple on Design Patterns. Each of the participants chose a principle or a pattern and gave a short presentation to the others explaining it (I pitched in and gave a presentation on the Liskov Substitution Principle so as to make up the numbers). Some of the participants were rather nervous about doing their first presentation, but the results were impressive, and it was great to see their confidence boost once we had run these sessions.

Finally, we arranged a weekly reading group. During the first three weeks I led the group in a discussion of chapters 13–15 of Eliyahu Goldratt’s *The Goal*, which give the participants an accessible introduction to the Theory of Constraints, which underlies much of the Kanban methodology. After this, they started reading material on code design, and soon told me that they were happy to continue these sessions without me. The approach of discussion one or two chapters each week seemed to work very successfully, as they could engage properly with the issues.

The project

The project was a key focus of the participants’ learning. We sought ideas from colleagues across the company, and then let the participants choose a project. The person who had suggested the project acted as client, and was expected to be available to answer any questions, and to come along to demos to see how the work was progressing. Tutors were also expected to help out and give guidance.

We emphasised iterative development in the projects, encouraging our participants to get a walking skeleton running and deployed to a server on the network through our CI server (TeamCity) as soon as possible. Only then did we suggest they start fleshing out functionality. We also stressed the importance of doing TDD from the start. This approach front-loaded many of the difficulties in the projects: participants found it hard work to get going, but once they had a deployed project, they found they could pick up speed in their work.

We arranged frequent demos in which the participants could show off the progress they were making in their projects, and get feedback from their Clients and the rest of the academy.

Unlike in their team work, where pair programming was usual, much of the project work was done solo. There was a fantastic moment of revelation about a month before the final demonstrations, when it became clear that several of the participants were struggling with technical issues, and were worried whether they would be able to complete their projects in time. I suggested that rather than working solo, they work together in rotating pairs to solve these problems. At our next catch-up they reported that they had solved the problems and made more progress in those short pairing sessions than they had in the previous weeks’ solo work. For our participants to learn first-hand the value of pair programming less than six months into their careers was wonderful.

What worked

  • We found fantastic recruits and excellent internal candidates.
  • Pull learning was very successful.
  • The projects gave a good context for learning.
  • The participants developed an excellent team spirit and ability to self-organise.
  • The focus on XP and Software Craft skills equipped the participants with skills that are lacked by many developers with many more years’ experience: we created four exceptional software developers.

What didn’t work as well

  • Having a Client for each project worked better in theory than in practice; some didn’t even attend the demos.
  • Participants sometimes had difficulty getting a good balance between team work and Technical Academy work, and there were weeks in which their projects got no attention.

Would I do it again?

In a heartbeat. It was an amazing experience, and seeing the quality of our graduates is incredibly gratifying. I have now left 7digital, but there is a new Technical Academy kicking off as I write this, led by one of last year’s cohort. I wish them the very best of luck!


 

Edited on 20 Nov 2015 at 15:04 to mention the presentation sessions.

Edited on 26 Sept 2017 at 16:31 to remove a reference to an individual whose political views are not consistent with the spirit of this blog.

Watch out for exceptions in NUnit TestFixture constructors

Don’t do anything that might throw an exception in a TestFixture constructor: your tests will be ignored, rather than failing.

My colleague Chris and I have been playing around with polymorphous testing, taking advantage of the fact that you can pass parameters to an NUnit TestFixture like this:

[TestFixture(parameter1)]
[TestFixture(parameter2)]
public class BazTests{
  public BazTests(Blah parameter){
    ...
  }
}

We wanted to use these parameters to pass in a collaborator, which would then enable us to run the same tests against different implementations of the same interface. However, we immediately encountered difficulties, as the TestFixture attribute will only take parameters that are compile-time constants, which rules out directly passing in a collaborator class. If you are adding parameters to a Test, you can use the TestDataSource attribute to pass in more complex arguments; however, no such attribute is available for TestFixtures.

To get round this, we created a static factory that would then create our dependency according to the enum value passed in:

[TestFixture(TestSetting.Foo)]
[TestFixture(TestSetting.Bar)]
public class BazTests{
  private ITestHelper _testHelper;
  
  public BazTests(TestSetting testSetting){
    _testHelper = TestHelperFactory.Create(testSetting);
  }
}

However, some of our implementations of ITestHelper require a connection string, and when it is not present, a NullReferenceException is thrown. We would have expected some of our tests to fail because of this, but instead they were just ignored, and as TeamCity disregards ignored tests, this meant that our builds were treated as successful.

It suspect the tests are ignored because exceptions in the test constructor are not sent to TeamCity as errors; rather the exception must happen in one of the attribute-decorated methods.

The code gave us the correct failures once we had rewritten it like this:

[TestFixture(TestSetting.Foo)]
[TestFixture(TestSetting.Bar)]
public class BazTests{
  private TestSetting _testSetting;
  private ITestHelper _testHelper;

  public BazTests(TestSetting testSetting){
    _testSetting = testSetting;
  }

  [TestFixtureSetUp]
  public void TestFixtureSetUp(){
    _testHelper = TestHelperFactory.Create(_testSetting);
  }
}

Roles vs Activities

At work, Gonçalo sent round a link to an article by Michael Lopp on management and engineers. Lots of discussion ensued, much of it fairly tangential.

This discussion got me thinking about the difference between roles and activities, and I’m going to sketch out these ideas here.

Roles

It is easy to talk about roles: a Project Manager does X, a Product Manager does Y; a Developer does φ, a Tester does χ, and Architect does ψ. This thinking encourages us to assign roles to people: to turn them into jobs.

Lopp talks about ownership, and certainly, if a role is assigned to a person, you know where to go to to get answers in that domain. If I have a Project Manager, I know who will give me updates on the progress of the project; if I have a Tester, I know who to ask to test a piece of functionality.

But it’s also about blame. If I have a Project Manager, then I know who to shout at if the project falls behind schedule; if I have a Tester, I know who to sack if a vulnerability is released that leaks personal data.

And I wonder whether the focus on individuals filling roles not only encourages this focus on blame, but remains attractive when a culture of blame persists.

Getting away from blame

At my work, when something goes horribly wrong, we carry out a blame-free post mortem. We establish the facts of the incident, acknowledge the aggravating factors, take note of the mitigating factors, and come up with a plan for the future. During this process we recognise that people did things, but don’t get hung up on criticising them for their actions; rather we try to understand why they acted in the way they did, and how we can make this better next time.

This approach differs radically from the traditional approach of declaring ‘heads will roll’, initiating a witch hunt, and ensuring that the persons responsible are at the very least made to feel thoroughly shitty, and quite possibly relieved of their duties.

In conducting a blame-free post mortem, we are not interested in roles and responsibilities, we are interested in actions and activities. It matters less who acted than what action was taken; not who failed to act, but what action would have helped.

Activities

Recover from incidents is smoother if we focus on activities rather than roles, actions rather than people; can this shift in emphasis help elsewhere? I think it can.

Again, where I work we don’t have Architects. This doesn’t mean we don’t do Architecture: we do it all the time. Our team whiteboards are decorated with particoloured diagrams of the systems we’re building. We treat Architecture as an activity, not as a role, and this means that many people are involved, understanding of the decisions is pervasive, assumptions are more likely to be challenged, and single points of failure are less likely to exist.

And what happens if we make a catastrophically bad architectural decision? Well, there is no one to point the finger at, no convenient repository for blame, as the decision was collective and consensual. Instead, we can recognise that the decision was poor, learn from that, and adapt and move on.

Conclusion

I have seen how a limited shift in focus from roles to activities can work well. I wonder whether a more comprehensive shift would have further advantages. This isn’t to suggest that everyone should be engaged in all activities all of the time, but rather that by introducing flexibility, collaboration and sharing, we might be able to move further away from a culture of blame.

Encapsulation isn’t just about code

In programming, we use the concept of encapsulation. Think of a hi-fi: if I add a radio to my setup, I expect to have a box with a few buttons on the front, and a few sockets on the back. This is the interface of the radio, and I expect it to be simple and predictable. What I don’t want is have to get out a soldering iron to connect it to my amp, or, even worse, have to fiddle around on the circuit board whenever I want to change the station.

We routinely apply this principle to software, but I’m beginning to wonder whether we should apply it to other systems too.

Let’s take a purely hypothetical example.

The Systems team at Brenda’s Badgers are making some infrastructural changes. One of these is to retire a server called Sauron, which is well past its prime and keeps failing unexpectedly. They send out an email to the dev team announcing the imminent decommissioning of Sauron, and asking for any potential problems to be flagged up.

Now, a couple of years ago it was decided that, rather than referring to servers by name, it would be a good idea to give them service names, which could resolve to their IP address. Sauron is used for various purposes, so it goes by the service names sett-cache-2.brendas.prod, brock-store-01.sql.brendas.prod and web-server-001alpha.www.brendas.prod. The developers always use these service names, and few of them were even around in the days of Sauron.

So, when the email comes round about Sauron’s imminent demise, no one is too concerned; this sounds like a piece of legacy infrastructure, so it must be someone else’s problem.

Crisis is narrowly averted when a developer in the Sow Care API Team does a search through their code, and finds a single reference to Sauron. They find this is being used in the same context as references to sett-cache-2.brendas.prod, and decide to do a bit more investigation, which uncovers the various services behind which this machine hides. A last-minute panic ensues, and, a couple of days behind schedule, Sauron is finally laid to rest.

[Now it seems to me that this has all come about because of a failure of encapsulation. The dev team shouldn’t need to know that there is a server called Sauron. Their concern is that there are machines that fulfil the services of sett-cache-2.brendas.prod, brock-store-01.sql.brendas.prod and web-server-001alpha.www.brendas.prod. By expecting the dev team to know about Sauron, the Systems team are leaking details of the internal implementation to the outside world. Instead, they should be using the common vocabulary of the service names.

Furthermore, if all the references to these services use the proper service names, rather than referring directly to Sauron, then the responsibilities of these services can be moved across to other boxes by changing the DNS configuration, rather than doing a full release, which removes a significant amount of fragility.]

So, in a post-panic retrospective, the Systems and Dev teams at Brenda’s Badgers got together to discuss why the process was so fraught. They agreed on the following changes:

  • In future, all code should exclusively use service names, to remove the fragility of pointing directly to individual boxes.
  • In future communications, the Systems team would refer to these boxes by the service names that point to them, rather than by their externally irrelevant names.

And everyone lived happily ever after.

[Of course, this isn’t really the end of the story: the service names themselves are smelly, as they seem to include version numbers. Further investigation might reveal multiple servers with load balancing distributing calls amongst them; this in turn might prompt questions about whether the dev team should even know about the individual machines, or should just be able to treat the services as black boxes.]

So, the moral of this little fable is that the notion of encapsulation is not only useful when writing code, but can also have applications in wider contexts: communicate data that need to be shared with a consistent vocabulary, and hide everything that doesn’t need sharing.

What features of natural languages should programmers investigate?

We watched a presentation by Ola Bini on Expressing abstraction—Abstracting expression, and the language geek in me got very excited when he introduced some natural language examples of different modes of expression.

But I was soon disappointed by his choices:

  • simile
  • metaphor
  • redundancy

Now, these are perfectly worthy topics for consideration, but if we really want to explore alien paradigms, how about we venture further afield. Here are a few:

 

The State Pattern, explored through the medium of cake!

At 7digital, we are running a weekly discussion group on design patterns, and I have been creating some small projects to exlore the issues raised in these discussions. The other week my colleague Emily presented the State Pattern, and it soon became clear that this is one of the more controversial patterns, as it seems to ride roughshod over the SOLID principles.

In this post I will give a bit of background, and then discuss the various approaches illustrated in my project.

The State Pattern

The classic case for the State Pattern is when an object’s behaviour changes according to the state it is in. A vending machine can only sell when it contains products, a shopping basket can only be checked out when it has products in it, a person can only get divorced if they are already married.

This pattern is implemented as an extension of the Strategy Pattern: the context object (vending machine, shopping basket, person) has a dependency on a state object, which in turn implements the behaviour (sell, check out, divorce) in an appropriate manner. Unlike cardinal examples of the Strategy Pattern, in which the strategy is set once and left alone, the State Pattern lets the strategy change during execution — often as the result of a method call on the state itself (if the vending machine sells the last product, it then becomes empty; if the only item is removed from the basket, it cannot be checked out; if the person gets divorced, they are no longer married).

Smelly

An immediate criticism of the State Pattern is that it leads to a system that is fairly tightly coupled. Not only is the context coupled to the state in order to make method calls, but a mechanism is then needed for updating the state, and a classical implementation of this is for the state to be coupled back to the context — a two-way coupling that is already a code smell.

Other implementations are possible: the context can update its own state after each state method call, or the state method can give the successor state as a return value; however, putting this question aside, it is the State Pattern’s lack of SOLIDity that makes it most problematic.

VAPID

Consider a context object that exposes several methods and has several states. Now consider that only some of these methods will be appropriate for any given state of the object (just as in the examples given above). Under the classic State Pattern, each concrete state must implement a method for every delegated method of the context object, even if it is not appropriate for it to do so (often implemented by throwing an exception).

First, this is a violation of the Single Responsibility Principle, as each concrete state now has responsibilities that it is designed to shirk. If we consider the ‘reason to change’ criterion, each concrete state can change because a) it changes the implementation of the stuff it does do, and b) it changes the implementation of failing to do the stuff it does not do.

More graphically, this is a violation of the Liskov Substitution Principle. This principle requires interchangeable objects (eg, those with the same interface) to behave in the same way, ie, given the same input conditions, they produce the same output conditions. The problem with the State Pattern is that it requires each concrete state to have different behaviour from its peers: violation of the LSP is seemingly baked into this pattern.

Finally, this pattern can lead to a violation of the Interface Segregation principle, insofar as a multiplicity of methods on the context class (which may be justified as being appropriate to that object) can then delegate to a a multiplicity of methods on the state interface, which, given their state-dependence, will no longer form a coherent collection.

Sharlotka

Let’s take a break from theory here, and look at my implementation.

I considered the stereotypical State Pattern examples, but let’s be honest: there are quite enough vending machines and shopping baskets in the world, and discussion of design patterns is not the context to be locking horns with Cardinal O’Brien.

So I thought I would use a culinary example instead.

Sharlotka (шарлотка) is a Russian apple cake. It is made by filling a cake tin with chopped apples, then pouring an eggy batter over the apples and baking in a warm oven till the batter has set. It is served dusted with sugar and cinnamon, and is really rather lovely.

It also makes a good, simple example of the State Pattern: each stage of cooking must be carried out in sequence, but there’s a nice loop in the middle, where you have to keep checking the cake until it’s cooked through before turning it out.

Classic Implementation

My first implementation follows the classic pattern.

You can see that there is one ISharlotkaState interface, which contains a method for each of calls that the Sharlotka context object delegates to its state. Each of these methods takes a reference to the Sharlotka as a parameter, so it can alter its state if need be. (The Sharlotka is passed with the interface IHasState<ISharlotkaState> to avoid making the State part of the public interface of the Sharlotka class itself.)

If you look at any of the concrete states (eg, ReadyToAddApplesState), you will see that most of the methods are left throwing a WrongStateException, as well as the mechanism for updating the context’s state. The _successor state is injected in the constructor, as this makes it possible to leave the mechanics of wiring everything together to the IoC container, rather than having to new up states within other states. In a small way this is reminiscent of the Chain of Resposibility Pattern, and does something to alleviate the tight coupling smell.

If you want to unit test one of the concrete states (eg, ReadyToAddApplesStateTests) then you are again left with a lot of boilerplate code.

This implementation highlights some of the deficiencies of the classic State Pattern implementation; however, it does still work, and may be appropriate for cases where it is not so much the behaviour of the context that is dependent on state, but rather its internal implementation.

Segregating the Interfaces

A more parsimonious approach has been suggested by Jan van Ryswyck, and I have implemented a version of this as the Small Interface project.

The key difference here is that rather than having a single, monolithic interface for all the states, the individual behaviours are broken into their own interfaces. When the context comes to call each state method, it first checks whether it can cast the state to the appropriate interface, and only then makes the call.

This implementation makes no further efforts to remedy the tight coupling, but does makes some improvements to the SOLID violations:

The Single Responsibility Principle is better supported, as each state is now only responsible for implementing the interfaces that are actually appropriate; we no longer have states also taking responsibility for throwing WrongStateExceptions, as this is done by the context object.

The Liskov Substitution Principle is better supported, as we no longer have sets of defective implementations of interface methods; instead we have used the Interface Segregation Principle to split out a set of atomic interfaces, each of which can more easily support the LSP. There are still opportunities for violation of the LSP, as it is entirely possible to implement the same interface in several states, and for each implementation to be different, but this problem is no longer inherent in the implementation.

This implementation makes good steps towards SOLIDity, but still has a major flaw, which is intrinsic to the state pattern: you can call methods on the stateful object that are simply not appropriate to its state; to continue the sharlotka example, you can call Serve before you have called Bake. Whilst doing this will throw an exception, it should arguably not be possible to do so in the first place.

A Fluent Implementation

A third possibility came up in our discussion — I think it was Greg who raised it —: you can implement your stateful class with a fluent interface.  You can see my version of this in the Fluent project.

Like the Small Interface implementation, this uses several atomic interfaces instead of one monolithic one; unlike that implementation however the interfaces are directly implemented by the Sharlotka class, rather than being delegated to a state object, and at any point in its lifecycle the Sharlotka is only considered as an implementation of the interface that is appropriate to its current state, ie, you never have access to methods that are not currently supported.

The trick in implementing this is in the fluent idiom: rather than calling

sharlotka.AddApples();
sharlotka.AddBatter();
sharlotka.Bake();

we want to chain the methods like this:

sharlotka.AddApples()
	.AddBatter()
	.Bake();

We can do this by making each method return this, but with a return type of the appropriate next interface.

For example, TurnOut is implemented like this:

ICanDustWithSugar ICanTurnOut.TurnOut() {
	return this;
}

which means that the next call must be a method of ICanDustWithSugar.

This implementation does away with the smelly tight coupling of the State-Pattern examples, as state is represented by the return type interface of each method, rather than as a property of the Sharlotka object. The application of the Single Responsibility Principle is rather less clear in this example, as the methods are stubbed out, rather than being implemented in any meaningful way. It is quite likely that in a real system the implementation of each method would be delegated to another object in order to honour this principle; this would look rather similar to the Small Interface implementation, with the crucial distinction that the implementation would not be responsible for updating the Sharlotka’s state.  The Liskov Substitution Principle ceases to be a question here, as we have moved to small interfaces with single implementations, and this fact also supports the Interface Segregation Principle.

Where this implementation is not so suitable is in cases where the state of the object after a method call is not clearly determined. For instance, withdrawing money from a bank account can leave the account in credit or overdrawn; in such a case the trick of returning a particular interface is not sufficient. A small example of this in my implementation is the Bake loop, and I have overcome the problem in this particular case by checking the return type until it is not null. However, this technique is already a significant departure from the fluent idiom, as it relies on runtime checking to make sure that the code actually works.

There is another danger with this implementation, in that it relies on the consumer knowing to use it fluently. There is no protection against storing the return value of any method in a variable, and calling the same method more than once (indeed, this is what is done during the Bake loop), and any person coding with this interface needs to know that methods must be chained. However, the fluent idiom is fairly commonplace now, and neither of these considerations is one of code quality.

& Messe it Forth

These three implementations illustrate different approaches to the same problem. The Classic implementation is unlikely to be the best in most circumstances because of its tendency to produce Liskov-violating defective methods; the Small Interface implementation overcomes these problems, and is probably most suitable for situations where the state of the object does no change in easily predictable ways; the Fluent implementation is handy when a particular sequence of methods should be called, but less idiomatic when the sequence can branch at runtime.

There are also tantalising prospects for implementing this type of system with monads, but I’m going to leave that for another day.