How to get Spork working NOW on Rails 3, Rspec 2 and Cucumber
I’ve spent the evening trying to get Spork to work with Rails 3 and RSpec 2. I’ve never felt the need for it before, but the Rails 3 start up time is fairly hefty and I’m crying out for the extra seconds more than ever.
It’s not that tricky, thankfully, and the following steps should see you running faster specs and features in no time.
RSpec 2
Follow these instructions to get RSpec 2 working:
Install Spork into your Gemfile, and update rspec to 2.1:
You’ll need my fork of Spork for a quick patch to the latest release candidate of Spork.
Add --drb
on a new line in your .rspec file:
If you don’t have the .rspec file, create it.
Modify your spec_helper.rb:
You could follow the installation instructions, but not everything is relevant to Rails 3 and Rspec 2. It’s pretty simple anyway: add “require ‘spork’” to the top of your spec_helper.rb file, and put everything else inside spec_helper.rb inside a Spork.pre_fork do … end block:
That should be it. To start up the server, run:
…and then try running a spec or two. The following command takes about a second on my machine now, whereas it used to take about ten seconds!
Cucumber
It’s important to note that for more than about 10-20 scenarios, Spork is slower than running cucumber normally. Therefore only turn it on for a few profiles, such as autotest (but not autotest-all), wip, etc.
Modify your cucumber.yml file:
Leave ‘autotest-all’ and ‘default’ alone.
Modify your features/support/env.rb:
This is just the same process as with the spec_helper.rb file for RSpec:
Again, that should be it. Run the follow to try it out:
Now try running a single feature in rerun or autotest mode. I’m getting 20% speedups for about 10 scenarios.
Using them together
The RSpec and Cucumber versions of spork use different ports, so there’s no problem running them together. Normally I run both in the same terminal window, one as a background process:
Then I run autotest in another window.
How do I use this?
I’m really liking this setup. It makes rapid TDD possible again, even when dealing with fairly slow tests.
Of course, we should be doing all we can to get the speed of our tests as high as possible: slow tests are a type of code smell. However, infrastructure load time is unavoidable and cutting it out is full of all kinds of win.
Use this setup with autotest and autotest-growl for maximum win. Autotest has come a long way recently: there’s a lightweight alternative to ZenTest now, and easy growl support. Cutting out even the ‘Oh, I should run my tests now step’ totally nails your debug cycle: not sure it gets much tighter than that.
UPDATE: Even more speed!
Jo Liss got in touch: she’s made some performance gains by skipping the “bundle exec” and requiring a few extra files in the prefork block. Read about what she has to say here.
Share
More articles
The toolchain of dreams
Seems like yesterday people were saying that it was difficult to host Ruby apps. It was around the time people were saying “Rails doesn’t scale”, which thankfully has been proved dramatically wrong.
For a while now Ruby apps have been unbelieveably easy to run and host, especially when you’re getting started.
But it’s got even better than that in the last few months. I’ve now got a complete Continuous Delivery toolchain set up for my latest app, entirely in the cloud. It’s Continuous Delivery As A Service, and it’s dreamy. This is how to set it up, and how it works.
Source control: Github
I’m using Github for code hosting and source control. You probably are already too. Most of the other services integrate with it very well, so setting this toolchain up is so much easier if you’re using it.
Build server: Semaphore
Cloud-based build services have been running for a while now. I like Semaphore - the user interface is clean and easy to read, and it does automatic deploys of passing code:
Set up Semaphore by creating a trial account, connecting it with your Github account and picking the repository you’d like to build. It automatically analyses your project for a build script so if you have a standard Ruby or Rails project you probably won’t need to configure it much.
Deployment: Heroku
If you’re using Heroku to deploy your code, set it up to deploy to Heroku. It takes a few seconds in the settings menu for your project to do so. You can also make it use a Capistrano deploy script.
Quality Analysis: Code Climate
Lastly, set up Code Climate to monitor the quality of your app’s code. Setting up Code Climate is similar to Semaphore: sign up for a trial, connect to Github, select the repository. It will automatically set up the Github commit hooks for you.
To get code coverage integration, you’ll need to install a gem, but it only takes a few minutes.
How the toolchain works
Out of the box, Github tells Semaphore to build every commit I push. If I push a branch, Semaphore builds that, too, and updates the build status of the commit so that everyone can see if the pull request is ready:
Merging code into master
When the pull request is merged, the code goes into master:
- Semaphore builds the master branch. If the tests pass, the code is deployed to Heroku.
- Code Climate automatically gets notified by Github and checks to see whether coverage has improved or decreased, whether I’ve introduced any Rails security problems, or whether my code is bad:
Logging
Builds, deploys and Code Climate notifications are all automatically posted to Hipchat, so I get a log of everything that’s happened without being inundated with emails:
Just set up a Hipchat account, get a Room API key from the settings page, and plug that into Github, Code Climate and Semaphore. Done.
The dream toolchain
This is how you use this toolchain:
Every time I push some code, it’s checked carefully, and monitored for quality and security holes. The tests are run and coverage reports are generated and presented nicely. If all the tests pass the code immediately gets deployed to production, and all of this activity is reported and logged in one central place.
This is the future. It really doesn’t get much better.
Time is valuable: and this didn’t take long
This took me about 40 minutes to set up. 30 minutes of that was fiddling with the settings of the various tools: but actually leaving them all set to default does the right thing for me in all cases. Most of the tools simply connect to your Github account to set up all the access controls and keys for you.
The cost
For one project, this incredible toolchain will cost you the following:
- Github: $7 a month for the micro plan
- Semaphore: $14 a month for the solo plan
- Code Climate: $24 a month for the solo plan
- Hipchat: Free for one room
- Heroku: Free for a one dyno app.
That’s $45 a month. That’s next to nothing for such an amazingly powerful toolchain. Plus if you run more than one project, the per-project cost decreases dramatically.
I used to run one server to host one Rails app for $140 a month, for years, with no build server, deployment or code metrics built into the toolchain. Today I pay half that for a much more sophisticated setup.
Admittedly, the hosting costs with Heroku will go up once your app becomes popular, but this is a good problem to have, and at that point you shoud have the cash to invest in a chef-based cloud server deployment solution. I run one of those for an old SaaS service I run to keep costs down. It’s still very easy to connect a different deployment strategy in to this toolchain.
So: what are you waiting for?
Read moreYour tests are lying to you
Using mocks within your test suite has gone rather out of fashion. Programmers everywhere have been lamenting the fact that mock-based tests are becoming more and more brittle: they’re having to change the test code in multiple places each time there’s the slightest code change. In fact, they seem to be changing the test code much much more often than the production code.
Using mocks appear to require a lot of set up code for the object under test. Why not just fire up Factory Girl, create a bunch of objects we need to test this code, and just check the outputs?
This works, and appears to work nicely. For a while.
Eventually your tests will get to the point where they’re lying to you: they’re telling you your code works whereas actually it only works by coincidence. This post will examine the different techniques we can use to test code, and why some work better than others in the long term.
The problem
To look at this further, let’s try to write a conference simulator for a new website that tries to predict how many people might attend an upcoming event:
A simple start, with equally simple production code. Next, we decide to extract our code for calculating the rating into Speaker
classes. We decide not to change the test suite much, and make the code work behind the scenes:
A nice simple, easy change? You’ll pay for this later. Where is the Speaker coming from? Your Conference class is creating it somewhere, or retrieving it from a factory. You’ve increased the number of collaborators for this class by at least one (possibly three), yet your test isn’t showing the additional complexity. It’s deceitfully hiding it, whilst you continue on in blissful ignorance.
Your tests are now sitting around the outside of your system. There are no tests for the Speaker class at all, except that we co-incidentally check the rating it emits. Another developer is likely to miss the connection and remove the implied test whilst changing the code for a different reason later.
This gets worse over time:
Can you see what’s going on here? We’ve created some nice helper methods to make it easy to create the required talk objects we need. This test is fairly easy to read, but it’s dressing up the problem. The test code is relying on far too many collaborators to function correctly to return the correct result.
When you extract a class, your purely state based tests don’t always require change. If you’re not stubbing out or mocking systems, you can end up in a situation where you’re relying on the code to work without realising it.
How could it be improved?
Now we’ve isolated the method nicely from its collaborators, and ensured that its behaviour is correct: that it aggregates the ratings of the talks and the schedule. We also make sure that we’re testing Conference correctly, also in isolation.
The more you use refactoring methods such as Extract Class without cleaning up your tests, the more likely your tests will be lying to you. Little by little, those tests that you trusted are slowly testing more and more code. You add a multitude of edge cases at the edges, never thinking about the complexity within. You’ve resorted to using end-to-end tests to test basic correctness.
This is a bad thing on many levels: for example, what happens to interface discovery? How will you know how the interface of your lower-level classes needs to behave if you’re not mocking or stubbing it? You are resorting to guessing, rather than exercising the interface ahead of time in your tests. If you have tests around the edges, but not in the middle, you’re not gaining the design input that tests give you in each layer of your system.
Your code stinks
If you go the whole hog with testing in isolation, then you might end up here with something like this:
I’m not surprised people moan about maintaining this: if any aspect of the Conference class changes, this test will break and need to be fixed. We can see that this test code is hard to write and difficult to read. It would be so much easier just to hide this setup in a few factory methods with some sensible defaults, right?
Maybe it’s not the test code that’s the problem. Perhaps the code stinks. Perhaps the class simply has way too many collaborators, which is why your test code contains a large amount of set up.
For this test code, we can see there are several objects leaking all over the conference code: to refactor this I’d probably get through a Scheduler, Caterer and perhaps a TrackAggregator before I was done. I’d ensure all these objects were tested in isolation, and ensure that there are acceptance tests all the way through to make sure the customer has what they need.
Well designed code is easy to test. As a rule of thumb, anytime I get over about two or three lines of setup code for testing a method, I normally take a step back and ask myself if this method is doing too much.
Test speed
The other advantage of running tests purely in isolation is that they’re fast. Very fast. When I’m coding Rails apps these days, thanks to advice from Corey Haines I’m running a spec_no_rails
folder which runs independently from the rest of my Rails app. Rails apps by default epitomise this problem: default model tests exercise the whole system from the database up. By running your tests independently you’re not having to clean the database or start Rails each time you run your tests, which means that much of your interesting code can be tested in under a second. Gary Bernhardt has more information on how to set this up in his excellent Destroy All Software screencast series.
What I’m not saying
This isn’t an argument for or against Mocks or Stubs. Either technique can be used successfully to generate clean code. It’s an argument about only exercising the code under test, and leave the rest of the system to take care of itself. The important thing is that you don’t exercise your collaborators: whether you check they’ve received messages or simply stub them to return input doesn’t matter.
Don’t forget end-to-end tests. These are very important for business acceptance and for ensuring basic functionality. The important thing is to ensure that you’re being intentional about your end-to-end tests and ensure your unit tests are not end-to-end tests by accident.
Take a good look at the test code for a project you recently worked on. You don’t need to look at the production code yet: notice that I’ve not included any production code in these examples. You shouldn’t need to see it to know whether it’s of good quality or not: you can tell that by reading the tests.
Which is the most annoying or bulky part of your test code? Are your tests deceiving you about what they’re testing? How could you improve the code to make this test code easier to maintain?
Read moreKanogo: vapourware to beta in 24 hours
The backstory
A while back I agonising over which should be the next greatest feature for one of my products. I thought the best thing to do would be to conduct some Kano analysis on the product in question, and realised there wasn’t an easy way of doing this. I’ve used kanosurvey.com in the past, but it didn’t really feel like the right tool. How was I to get users to answer my survey?
“Wouldn’t it be great,” I thought, “if I could embed a little survey box on the site that asked customers what they thought and provided me with Kano analysis stats?” The concept behind Kanogo was born.
Fast forward several months to last week. I found myself with a few days spare and decided that the best use of them would be to build a beta of this product. Always up for a challenge, I decided to give myself 24 hours to build and launch.
That’s not very long, so I had to hustle.
Timeline
7 Sep: 12:10am: I announced my intentions, mostly to motivate myself through fear of failing in public. I finally decided on a name, and registered the domain and the twitter account. I announced the product to the world (well, a subset).
7 Sep: 01:55am: Got a new Rails 3.1 app running on Heroku cedar. It’s a one page app using a Campaign Monitor signup form. Got my first beta signup. Finished for the night.
7 Sep: 07:40am: Announced Kanogo again, just in case anyone had been sleeping at 2am :) Got another 3 beta signups and a bunch of feedback on spelling errors.
7 Sep: 10:13am: Simple twitter sign in done using Omniauth and this really useful tutorial.
7 Sep: 02:45pm: The USA woke up and I got more beta signups: now up to 5. Got the basic data entry for surveys and features done. Started work on the embed. Was feeling fairly pessimistic about a beta launch for that night, but didn’t want to let myself down.
7 Sep: 05:53pm: Embed done, quicker than expected. Took a break. Now feeling cautiously optimistic.
7 Sep: 09:12pm: Basic response mechanism in: now needed to apply the Kano analysis magic! Adrenalin took over from caffiene as primary stimulant.
7 Sep: 11:20pm: Turned on twitter sign in as basic method of getting registered on the site. Removed redundant Campaign Monitor signup: emailed subscribers manually to ask them to sign in via twitter. Beta went live!
The result
After 24 hours, I had a beta running, which worked. Granted, it wasn’t great, but it was something that had some value.
I spent the rest of the evening and following morning promoting the beta on mailing lists and on twitter. By the end of the following day I had 30 or so beta signups.
It’s already adding value to beta users. Two sites using the beta already on their own products. One beta user has now decied to implement a feature as he’s realised his customers consider it a “must have”. There’s no substitute for real feedback.
Learnings
Some of the things I’ve learned so far:
-
Cloud tools are the business. It was so easy to register the domain with dnsimple.com, start up a twitter account for marketing and customer interaction, deploy to Heroku, get initial beta signups with Campaign Monitor.
-
Modern development tools rock. I used Rails 3.1 for this app, which worked beautifully, and I love the use of sprockets to help manage the asset pipeline. Running the app on Heroku cedar went without a hitch. I used twitter for authentication, and it only took an hour to set up.
-
There is no “quick and dirty”. The app is (almost) fully tested: I confess I left a couple of methods only covered by end-to-end tests (which doesn’t really count). I definitely proved that the only way to go fast is to go clean: Jason was right that there is no “quick and dirty” only “slow and dirty”. This came back to bite me instantly: the code I didn’t use specs for took me the longest to get working.
-
Technology is the easy part. It didn’t take me long to build the site, but the trick is to build a business. After initial interest, the analytics on the site are way down as the next new thing appears on the internet and people move on. To gain traction I need to build the app my beta users actually want. Thankfully, quick feedback is what Kanogo does, so we’re eating our own dogfood and asking our users what they think at every turn. This is already directing which features I work on next, which has to be the most efficient way of moving forward, right?
What’s next?
I plan to continue working on this, listening to beta user feedback, refining the features, and accepting new beta signup for the moment. I hope to turn this into a paid product at some point, as I think there’s a huge amount of value here to websites if I can get the messaging right.
Can I get involved?
Sure! It’s not too late to join the beta: you can do so here. I’d love your feedback on the product. It can give you value anywhere you have users of a website, even on a blog as shown above.
Read moreHow I learned to stop worrying and love (some) detailed Cucumber features
As the revival of interest in Cucumber continues, I’m finding that a lot more people are using Cucumber for two very different types of testing. When coaching or training, I sometimes come across QAs writing Cucumber tests like this:
I don’t use Cucumber like this… but I’ve changed the way I approach features such as these when I come across them.
How I use Cucumber
When I use Cucumber, I hold discussions with stakeholders, and I write down the results as Cucumber features, carefully avoiding too much incidental detail to help with maintainability later. These features form the initial acceptance tests for my system. I then use TDD to flesh out the functionality I need. The features end up as very useful documentation and regression testing artefacts (which can even form a user manual for the application.)
The example feature above is very different. It is an exhaustive regression test to check that the scroll option is working in all cases on every page. This example is pretty short: in reality I’ve seen extremely long Cucumber features written in this style. Note this isn’t the same as very long boring and overly detailed features: they’re running a simple scenario in many slightly different ways.
Because this is not how I use Cucumber, I used to discourage this long form style of feature writing out of hand. I’ve learnt to stop worrying… as long as it’s clear what sort of features these are and how they should be treated.
Developers: who are we to judge?
Firstly, developers: I don’t think we should be saying “you can’t write tests like this.”
Just because people are not using the tool how we might expect them to, their use of it is not invalid. It’s very tempting to say “you’re doing it wrong”, because these feature look so much like the “bad features” developers are taught to eradicate from their codebases. However, we have to understand that they’re simply using the tool for a different set of advantages it provides: it allows them to quickly run through expected functionality on a multitude of different places.
There isn’t one way to use Cucumber (or any tool) - there are only ways that give value, and ways that increase or decrease friction. We would be wise not to discount the way that others get value out of the tools we use, just because they use them in a way that we didn’t expect.
A different approach
When I see these sorts of features, rather than dismissing these features as ‘too detailed’ or ‘unmaintainable’, I ask questions about who is using these features. Who is writing them, who is reading them, and who is keeping them up to date?
Often it’s the QA people on a team who are exclusively writing with these types of tests. These are then handed on to the developers who are getting very frustrated with them. No one is clear who should be maintaining them. The developers don’t want to, and inevitably try to refactor them, which annoys the QAs as the detailed regressions they were aiming for are lost. The QAs don’t want to maintain them as they usually don’t have strong coding skills and therefore they find it hard to maintain the step code. The end result is a void of responsibility which gives rise to a mess of unmaintained code.
A good solution? Be clear about the responsibility. Move these features out of your regular BDD workflow. Create a structure a bit like this:
features/
docs/
account_management.feature
buying_products.feature
step_definitions/
...
regression/
menu_interface_checks.feature
step_definitions/
...
Have the QAs maintain all the features and step definitions under regression
above, allowing them to manage their own features without conflicting with the needs of the developers. Ensure that they’re only run under controlled conditions (perhaps as a nightly build) rather than as a part of the normal BDD workflow, otherwise they’ll slow development down to a crawl.
Who should be writing features? When we’re using Cucumber from the point of view of development and documenting functionality for customers, then write the features in collaboration with developers, testers and your business people, in ‘3 amigos’ style. However, when you’re using Cucumber to effectively construct old fashioned test scripts which perform exhaustive regression testing of the application, then I can see value in this approach.
From tail end to up front: QAs to Analysts
A word of caution for QAs: the important thing is to discuss this type of test with your developers and your business people. Are we testing where the risk is? What is the likelihood of this test ever failing in practice, catching a real bug that otherwise would not have been caught? What’s the impact of such a bug? If there’s little to no risk, or little impact, then the test we are writing has very little value, and we are creating work for ourselves for the sake of it. Overtesting is a waste of time: there is a better path for QAs.
I often try to work with QAs to transition them to a role which is much more upfront than at the tail end of the process. Traditionally, QAs are thrown working (but untested) code to see if they can break it, and the more code is sent back the more wasteful the process is.
However, with BDD there’s a lot more automated testing going on. Developers are receiving their requirements direct from the stakeholders through proper communication, distilled down to clear Cucumber features. QAs should be involved in this process, working with the stakeholders, teasing out edge cases. If that sounds like a Business Analyst role to you, then you aren’t far wrong: the roles can be very similar.
The old fashioned methods of in depth regression testing using scripts can still be useful. However, thanks to the advances of BDD and Specification By Example, there’s less need for QAs to take a lead in this area. Instead of writing these tests, or God forbid clicking through them manually, they have the opportunity to take a lead in defining the scope of what’s under test and when a requirement is finished.
In summary
There’s nothing wrong with features like these in our codebase as long as we understand who they’re for, why they’ve been created, and who is maintaining them. Let’s not be quick to dismiss them, just because we’re not used to writing features in this style; but let’s also be sure they’re necessary before littering the build with brittle tests that have little value.
Do you have features like these in your code? Are you using them to drive business value, or are they clearly separate from your other features? How could they be improved?
Read moreExtreme isolation part 3: coding a CRUD app (with full example)
CRUD apps start simple, yet often get messy and nasty really fast. They are a great test bed for Extreme Isolation.
I started a few months ago looking at a fresh new way of architecting web applications. I suggest you read parts one and two first.
The app I’ve been mainly working on using this new method is an online version of Sol Trader, which isn’t really a typical web application most people write. I’ve since applied this paradigm to a directory application called “Discover” I’ve been working on for the Trust Thamesmead charity, and I thought I’d share the results.
Discover is a much more traditional “CRUD style” application. The administrators define audiences for a local area (people who go to school, or want to find a job) and add places to a site, grouped into topics for that audience. For example, if you’re into music (the “music” audience) you might want to see places in the “music shops”, “gig venues” and “music video shoot locations” for a particular area.
The source code is fully open source. Trust Thamesmead have a great ethos: they would love other local areas to pick up the application and run with it. This also means that I can use the codebase as a demonstration of extreme isolation.
Let me take you through how it works.
The basic models: Audience, Topic, Place
Let’s have a look at the data representation for models first. Check out audience.rb:
These objects are immutable. They are created from an AudienceRepository
, which handles all the persistence of the objects for you. They know nothing about loading, saving or disk representations, which is exactly as it should be.
Audiences themselves are very simple containers of a description and a list of associated topics. They have a method to generate a slug, and two generator methods to create new audiences based on this topic: that’s how we handle updating audiences.
A web request to retrieve an object
The web logic is wrapped up in two files: a Sinatra application in app/audiences.rb which acts like a controller would in Rails, and a shared module in app/crud.rb which contains logic used by all the other Sinatra apps.
A web request comes in to the application and runs this code in the shared logic:
This find method is defined in the Audience-specific class:
The AudienceRepository
takes care of the persistence end of things (you can see how in persisted/audiences.rb), and returns back a plain ruby Audience
object as shown above. This object is then passed to the edit.haml view file as @object
and we’re done.
A web request to update the object
Updating the object is more interesting. The following action is called first, which then calls a series of other methods:
The first line retrieves a plain immutable Audience
object as before. The update_from_params
method is called next: this returns a new Audience
object with the updated information, using the factory methods we defined on the model earlier.
Validation
The new Audience
object is passed to an AudienceValidator
object (defined here) which takes a list of existing slugs in the database, and returns one of two things:
- A
ValidAudience
change if the newAudience
object is valid - An
InvalidAudience
change if it is not valid
We appear to be reinventing the wheel with the Validator object here: but the great advantage with doing things this way is that the object has no dependency on the database at all. This means it can be tested in isolation, it’s fast, and we can chain them together and reuse them in more situations.
Applying the changes
The queue of changes is then pipelined through various other services in true Extreme Isolation fashion. Firstly we apply the queue to an object we receive from the editor
method call in the Sinatra application:
This processes the ValidAudience
change and returns an AudienceEdited
change, which is tacked on to the end of the queue of changes. (See reactor.rb for exactly how the plumbing works.) An InvalidAudience
change is ignored - we don’t want to edit the audience in this case.
The resulting change queue is then passed to downstream
which is the set of services that process all web requests:
The AudienceRepository
picks up the AudienceEdited
change and does the correct thing to the persisted record. The AudienceHandler
works out how to return the right message to the web interface. It handles InvalidAudience
and AudienceEdited
messages, as well as AudienceCreated
and AudienceDeleted
messages for the other CRUD operations.
Creation and deletion
The other CRUD operations work very similarly. The creation simply constructs a brand new Audience
object, checks validity and passes the resulting set of changes to the AudienceRepository
and the web handler. Deletion is even simpler: it just passes an AudienceDeleted
message to the downstream
method.
Extending the set of services
This way of doing web applications is extremely extendable. Here’s a much more complex downstream
method for Sol Trader Online, which is run for every single player action web request in the game:
Each piece is totally isolated and therefore easily testable. When one service gets too complex, it’s easy to split up what it’s doing into two services: PositionPermissionChecker
is a recent extraction from the code inside the Position
object.
Conclusions
This is still an experiment. It’s more involved that a typical CRUD app, and harder to get going, but the individual pieces (the validators, the Editor
class, the Handler
classes) are all very testable as they only do one thing in isolation.
There are also many ways that I could improve the web logic, but at the moment those classes are fine for my purposes. Likewise, all the javascript is still inline in the views, and has yet to be pulled out and refactored.
What do you think of the approach? Can you see yourself using it on your next project?
Read more