Your tests are lying to you

October 2011

Using mocks within your test suite has gone rather out of fashion. Programmers everywhere have been lamenting the fact that mock-based tests are becoming more and more brittle: they’re having to change the test code in multiple places each time there’s the slightest code change. In fact, they seem to be changing the test code much much more often than the production code.

Using mocks appear to require a lot of set up code for the object under test. Why not just fire up Factory Girl, create a bunch of objects we need to test this code, and just check the outputs?

This works, and appears to work nicely. For a while.

Eventually your tests will get to the point where they’re lying to you: they’re telling you your code works whereas actually it only works by coincidence. This post will examine the different techniques we can use to test code, and why some work better than others in the long term.

The problem

To look at this further, let’s try to write a conference simulator for a new website that tries to predict how many people might attend an upcoming event:

describe Conference do
  it "calculates total rating" do
    conference = Conference.new(:total_rating => 9)
    conference.total_rating.should == 9
  end
end

A simple start, with equally simple production code. Next, we decide to extract our code for calculating the rating into Speaker classes. We decide not to change the test suite much, and make the code work behind the scenes:

describe Conference do
  it "calculates total rating" do
    conference = Conference.new(:speakers => [:chris, :paul])
    conference.total_rating.should == 9
  end
end

A nice simple, easy change? You’ll pay for this later. Where is the Speaker coming from? Your Conference class is creating it somewhere, or retrieving it from a factory. You’ve increased the number of collaborators for this class by at least one (possibly three), yet your test isn’t showing the additional complexity. It’s deceitfully hiding it, whilst you continue on in blissful ignorance.

Your tests are now sitting around the outside of your system. There are no tests for the Speaker class at all, except that we co-incidentally check the rating it emits. Another developer is likely to miss the connection and remove the implied test whilst changing the code for a different reason later.

This gets worse over time:

describe Conference do
  it "calculates total rating" do
    conference = Conference.new(
      :schedule => :nine_to_five,
      :talks => [talk_for(:chris), talk_for(:paul)]
    )
    conference.total_rating.should == 9
  end
end

Can you see what’s going on here? We’ve created some nice helper methods to make it easy to create the required talk objects we need. This test is fairly easy to read, but it’s dressing up the problem. The test code is relying on far too many collaborators to function correctly to return the correct result.

When you extract a class, your purely state based tests don’t always require change. If you’re not stubbing out or mocking systems, you can end up in a situation where you’re relying on the code to work without realising it.

How could it be improved?

describe Conference do
  let(:talk1) { double(:talk, :rating => 10) }
  let(:talk2) { double(:talk, :rating => 6) }
  let(:schedule) { double(:schedule, :rating => 10) }
  before(:each) { Schedule.stub(:new => schedule) }
  it "calculates total rating" do
    conference = Conference.new(
      :schedule => :nine_to_five,
      :talks => [talk1, talk2]
    )
    conference.total_rating.should == 9
  end
end

describe Speaker do
end
describe Schedule do
end

Now we’ve isolated the method nicely from its collaborators, and ensured that its behaviour is correct: that it aggregates the ratings of the talks and the schedule. We also make sure that we’re testing Conference correctly, also in isolation.

The more you use refactoring methods such as Extract Class without cleaning up your tests, the more likely your tests will be lying to you. Little by little, those tests that you trusted are slowly testing more and more code. You add a multitude of edge cases at the edges, never thinking about the complexity within. You’ve resorted to using end-to-end tests to test basic correctness.

This is a bad thing on many levels: for example, what happens to interface discovery? How will you know how the interface of your lower-level classes needs to behave if you’re not mocking or stubbing it? You are resorting to guessing, rather than exercising the interface ahead of time in your tests. If you have tests around the edges, but not in the middle, you’re not gaining the design input that tests give you in each layer of your system.

Your code stinks

If you go the whole hog with testing in isolation, then you might end up here with something like this:

describe Conference do
  let(:talk1) { double(:talk, :rating => 10) }
  let(:talk2) { double(:talk, :rating => 6) }
  let(:talk3) { double(:talk, :rating => 2) }
  let(:talk4) { double(:talk, :rating => 8) }
  let(:track1) { double(:track, :talks => [talk1, talk3] }
  let(:track2) { double(:track, :talks => [talk2, talk4] }

  let(:venue1) { double(:venue, :nice_coffee_places => 3) }

  let(:joe) { double(:announcer, :experience => 5) }

  let(:schedule) { double(:schedule, :rating => 10, :accouncer => joe) }
  before(:each) { Schedule.stub(:new => schedule) }

  it "calculates total rating" do
    conference = Conference.new(
      :schedule => :nine_to_five,
      :tracks => [track1, track2],
      :organiser => joe,
      :venues => [venue1, venue1]
    )
    conference.total_rating.should == 6.3945820
  end
end

I’m not surprised people moan about maintaining this: if any aspect of the Conference class changes, this test will break and need to be fixed. We can see that this test code is hard to write and difficult to read. It would be so much easier just to hide this setup in a few factory methods with some sensible defaults, right?

Maybe it’s not the test code that’s the problem. Perhaps the code stinks. Perhaps the class simply has way too many collaborators, which is why your test code contains a large amount of set up.

For this test code, we can see there are several objects leaking all over the conference code: to refactor this I’d probably get through a Scheduler, Caterer and perhaps a TrackAggregator before I was done. I’d ensure all these objects were tested in isolation, and ensure that there are acceptance tests all the way through to make sure the customer has what they need.

Well designed code is easy to test. As a rule of thumb, anytime I get over about two or three lines of setup code for testing a method, I normally take a step back and ask myself if this method is doing too much.

Test speed

The other advantage of running tests purely in isolation is that they’re fast. Very fast. When I’m coding Rails apps these days, thanks to advice from Corey Haines I’m running a spec_no_rails folder which runs independently from the rest of my Rails app. Rails apps by default epitomise this problem: default model tests exercise the whole system from the database up. By running your tests independently you’re not having to clean the database or start Rails each time you run your tests, which means that much of your interesting code can be tested in under a second. Gary Bernhardt has more information on how to set this up in his excellent Destroy All Software screencast series.

What I’m not saying

This isn’t an argument for or against Mocks or Stubs. Either technique can be used successfully to generate clean code. It’s an argument about only exercising the code under test, and leave the rest of the system to take care of itself. The important thing is that you don’t exercise your collaborators: whether you check they’ve received messages or simply stub them to return input doesn’t matter.

Don’t forget end-to-end tests. These are very important for business acceptance and for ensuring basic functionality. The important thing is to ensure that you’re being intentional about your end-to-end tests and ensure your unit tests are not end-to-end tests by accident.

Take a good look at the test code for a project you recently worked on. You don’t need to look at the production code yet: notice that I’ve not included any production code in these examples. You shouldn’t need to see it to know whether it’s of good quality or not: you can tell that by reading the tests.

Which is the most annoying or bulky part of your test code? Are your tests deceiving you about what they’re testing? How could you improve the code to make this test code easier to maintain?

Share


More articles

How to Build a Robust LLM Application

Meal Generator

Last month at Cherrypick we launched a brand new meal generator that uses LLMs to create personalized meal plans.

It has been a great success and we are pleased with the results. Customers are changing their plans 30% less and using their plans in their baskets 14% more.

However, getting to this point was not straightforward, and we learned many things that can go wrong when building these types of systems.

Here is what we learned about building an LLM-based product that actually works, and ends up in production rather than languishing in an investor deck as a cool tech demo.

Read more

Your Code Is A Liability

Every chunk of code you commit is more for someone else to read, digest and understand.

Every complex “clever” expression requires another few minutes of effort for each of your team. They must now interpret what you wrote and why you wrote it.

Every line you add limits your project’s responsiveness to change.

Your code is a liability. Never forget this.

Read more

The Sol Trader Christmas Eve update: moddable missions

The relative radio silence from Sol Trader Towers is for a reason: I’ve been working hard on a flexible and moddable mission structure, that allows players to take a variety of interesting quests in-game.

This build is now available on the forums should you have access (there’s still time if you don’t.)

kill mission

I’ve built a few missions to start with, including delivering parcels for business or personal reasons, taking characters on business trips and making other characters disappear. It’s great fun to have a variety of things to do for characters now and adds yet more colour to the game. Because it’s completely moddable, I’m also excited to see what storylines other people come up with!

Under the hood

The full details of how to create your own missions are available as a lengthy forum post, which will be kept up to date with changes and clarifications. Here’s an overview:

The missions are organised into packs, which exists under the data/missions subfolder. If you have access to the beta builds, you’ll see there’s one pack there already: these are the missions that are built in to the game.

There are several csv files in each mission folder:

  • requirements.csv: This file details the cases in which this mission might be triggered. Each character in the game has a chance of picking this mission (and becoming the ‘giver’ of the mission), based on the conditions imposed by this file.
  • conversation_player.csv: The extra conversation options available to the player because of this mission.
  • conversation_ai_response.csv: The extra options the AI can choose from as conversation responses.
  • opinions.csv: The extra opinion triggers, used for reactions to the generation and completion of these missions.
  • strings.csv: The new strings needed for the previous CSV files.

The possibilities for you to build your own missions are expanding all the time, as I add new missions triggers and possible goals for the AI.

business trip

What’s next?

At the moment it’s possible to take on any mission from any person, which isn’t very realistic. I need to allow players to gain other character’s trust, so that they will only give you sensitive missions in certain cases. Additionally it will soon be possible to start a career with an organisation, which will give you a rank, a certain amount of built in trust, and access to more senior characters.

I’m also going to be working on the in-space AI very soon. At the moment only freelance traders fly around between planets: it’s time we had passenger ships, military guards and pirates thrown into the mix.

Have a fantastic Christmas and I’ll see you all in the new year with some more updates.

Read more

New Sol Trader beta: the science of blame and unforgiveness

Previously I wrote about how I’m modelling opinions and prejudice in Sol Trader. It’s time to put some of that information to use.

The opinions a character has of other people, based on the partial events that they know about them, will now directly affect the things that happen in the history generation. This creates new events, which will in turn feed more character opinions.

There’s a new beta available on the forums if you have insider access.

Dudley and Meredith

In the example on the left, we can see that an acrimonious divorce of Meredith’s parents has left an indelible mark on her childhood. She now has a very low opinion of her father, Dudley.

When characters are adults, they can then generate a series of ‘favours’ (or ‘missions’) that they want completed. This is a source of work for the players, although completing certain missions does have real consequences on your relationships with the target of the mission. If they find out you’ve taken a mission against them, then they won’t be happy with you.

To continue our example, Meredith, whom we are now married to, wants us to find out some potentially incriminating information about our own father-in-law, Dudley. It’s up to us whether we take it or not. If he finds out, we’ll make an enemy of him.

Is it worth getting involved in this feud?

As the game goes on, the player will get embroiled in these relationships between the various characters and be able to directly affect their stories. Choosing what to take on and who to ally yourself with forms a major part of Sol Trader’s gameplay.

Sarina’s spiral of doom

Another example: the sad tale of Sarina, our older half sister. I picked Dagny and Warren in history generation to be my character’s parents, knowing that Dagny was cheating on her husband Hayden, mostly to see what happened. Little did I know how much it would affect Sarina, Dagny and Hayden’s eight year old daughter. When she found out about my birth, she got very upset.

She didn’t blame me, thankfully, although she never thought much of me. However, she never really spoke to our mother again, especially since her beloved father Hayden died soon after we were born.

She left home at a young age, and became a political assistant, but she didn’t make too many friends. She was doing ok for a time, only to find out that the love of her life, Richard Ruhr, had been having an affair behind her back all along.

She divorced him, got depressed, quit her job and by the time I grew to adulthood at the start of the game, she was living in a hippie commune somewhere on Mercury, trying desperately to get some gossip on her ex-husband.

New beta out now

This new beta is now available from the forum if you have purchased insider access (if you haven’t there’s still time.) Let me know if you find any other interesting stories such as these!

Read more

Modelling opinions and prejudices in Sol Trader

I’ve been working hard on the Sol Trader core gameplay mechanics in the last two weeks. High up on my list was a way of generating more interesting missions for the characters to complete.

In order to have a reason to gather dirt, find locations or desire an early end for an enemy, our characters need to feel strongly about other people they know. This is where their opinions and prejudices come in.

So why is he so interested in where Terrilyn is? What does he know about her?

Characters already keep track of the events they know about for each other character in the game. Now they can form an opinion of a character based on the partial set of info they know about someone else’s past.

The plan is to use these thoughts about each other to make decisions about who they’re friends with, deal with relationship breakdown, blame and prejudice.

Characters can hold a wide variety of opinions about each other

Here’s an example of how we configure this under the hood for an occasion where a character is caught and reported for taking bribes:

    event,         opinion,    impact, I caught them, I was caught
    PRISON_BRIBES, PITIABLE,    -0.4,   0,             0
    PRISON_BRIBES, MORAL,       -0.4,   0,             0
    PRISON_BRIBES, INFLUENTIAL, -0.4,   1,             0
    PRISON_BRIBES, MY_FRIEND,   -1.0,   0,             1

Anyone knowing about this event will think the character is less deserving of sympathy and assume the character is less moral. If we’re the one catching them take the bribes, then the briber becomes much less influential over us. If we’re the one being caught, then the one catching us is definitely no longer our friend. Depending on our profession, we will brief against them or possibly try to take them out.

Now characters have opinions about others, we can use these to guide their conversation choices, who they’re likely to target, give us gossip on, etc. It’s all game design fuel for other behaviours in the game, and will combine to form interesting unexpected effects and tell original stories each time.

Next time I’ll discuss about the new events that get created in the history generation because of these new opinions. Our stylised formulaic view of history is about to become, well, a lot more messed up. Rather like real history…

Read more