Faster Tests in PHP: Avoiding latency with Fakes

Written by Dave Marshall - - Aggregated on Tuesday November 8, 2016

Faster tests get run more often. Fast tests are critical for people practicing TDD, keeping that feedback loop nice and tight. One of my favourite ways to keep tests running on time is to minimise the amount of waiting on I/O needed to exercise the system.

There are a handful of ways to do this, it's quite common for people to reach for their favourite test double tool, and create mocks and stubs for database connections or SDKs, but I'm not going to talk about that. I use a lot of mocks, but I don't use them to keep my tests fast. We can use another type of test double to try and get our tests running faster: Fakes. A Fake is a simpler or more lightweight version of the real system or component.

All of the methods I'm going to mention make compromises regarding the thouroughness of your tests. We're going to be changing the way the system operates to be different from how it will ultimately operate in production, to make our tests quicker to run. In doing so, we will be sacrificing the surety of testing the system end to end as it should be run.

There are three reasons why you might use a Fake to replace a component or system:

We're going to concentrate on the slowing down part. These days we all have super fast computers and networds at our disposal and most of the I/O we run for the basic web apps is pretty quick, but as your test suite grows, even milliseconds for a database query or HTTP request that gets run for every test soon adds up.

Fake Objects

One of the easiest way to avoid latency is to replace an object that uses disks or the network with an object that offers the same API, but does things a little differently, hopefully faster. If you are using any modern PHP framework that provides testing support, you are probably using a few fakes like these already. If your framework doesn't provide testing support, maybe you should choose another framework.

use Aws\S3\S3Client;

class S3Storage implements Storage
{
    private $bucket;
    private $client;

    public function __construct(S3Client $client, $bucket)
    {
        $this->client = $client;
        $this->bucket = $bucket;
    }

    public function put($targetPath, $contents)
    {
        $this->client->putObject(array(
            'Bucket' => $this->bucket,
            'Key'    => $targetPath,
            'Body'   => $contents,
        ));
    }

    public function get($targetPath)
    {
        $result = $this->client->getObject(array(
            'Bucket' => $this->bucket,
            'Key'    => $targetPath,
        ));

        return (string) $result['Body'];
    }
}

Assuming we're going to need to use this in our tests, we might go ahead and create a test bucket on S3 to run all our tests against and for the first few tests, this seems like it's working great.

As we write a few more tests, we start to realise things are running a little slowly. Even worse, your internet connection becomes intermittent or drops out entirely. Sure we could refactor some code, ditch those integrated tests and write more isolated unit tests, avoiding the problem, but that's not always ideal and definitely isn't the only way to approach the problem.

In Memory Fake Objects

Writing a simple implementation of Storage that stores the data in memory avoids our network problems (as well as a bunch of CPU cycles in the S3 SDK), getting our tests nice and snappy again.

class ArrayStorage implements Storage
{
    private $data;

    public function put($targetPath, $contents)
    {
        $this->data[$targetPath] = $contents;
    }

    public function get($targetPath)
    {
        return $this->data[$targetPath];
    }
}
More persistent Fakes Objects

One disadvantage to holding data in memory is that it goes away! If we wanted the data to stick around after a test run (this can be helpful for debugging) or if we need to have access to the same data across processes, (maybe you're shelling out or hitting a real webserver), we'll need something with more persistence.

In the previous example, the real system is somewhere on the internet, so a local disk based implementation will still incur some latency, but will operate much faster and more reliably than API calls over the internet.

class DiskStorage implements Storage
{
    private $dir;

    public function __construct($dir)
    {
        $this->dir = $dir;
    }

    /*
     * No error checking for brevity
     */

    public function put($targetPath, $contents)
    {
        file_put_contents($this->dir."/".$targetPath, $contents);
    }

    public function get($targetPath)
    {
        return file_get_contents($this->dir."/".$targetPath);
    }
}

You'll quickly find Fakes like the one above become useful outside of tests. You might find you use them in your QA environments, or for your local setup for development or for demonstrations. It's also quite possible that Fakes developed for testing end up being good enough to ship as production alternatives to the real systems.

Self Initialising Fake Objects

Self initialising fakes are kind of like a stub/fake hybrid. They act like a stub, in that they return canned results, but they're more like a fake because they actually proxy to a true implementation, caching the calls indefinitely. Ruby's vcr is a popular library that does this at the HTTP level, intercepting calls to Net::HTTP and a bunch of other HTTP clients, replaying the results on subsequent calls. At any given time, you can choose to forego the recordings and make the actual underlying HTTP calls. There's a PHP port that hooks in to curl, but I'm yet to try it out. For the purposes of a demonstration, we'll write our own naive implementation using the decorator pattern to cache calls to an underlying object.

class VCRStorage implements Storage
{
    private $storage;
    private $libraryDir;

    public function __construct(Storage $storage, $libraryDir)
    {
        $this->storage = $storage;
        $this->libraryDir = $libraryDir;
    }

    public function put($targetPath, $contents)
    {
        return $this->call("put", $targetPath, $contents);
    }

    public function get($targetPath)
    {
        return $this->call("get", $targetPath);
    }

    private function call($method, ...$args)
    {
        $file = rtrim($this->libraryDir, "/")."/".md5($method."|".implode("|", $args));

        if (file_exists($file)) {
            return file_get_contents($file);
        }

        $contents = $this->storage->{$method}(...$args);
        file_put_contents($file, $contents);

        return $contents;
    }
}

This could get quite complicated depending on how easily the arguments and return types serialise to disk, but you get the idea. For something as simple as the example above, I much prefer this solution to intercepting calls via PHP's autoloader à la php-vcr..

Verified Fake Objects

Once our Fakes get a little more complicated, we might want to start writing tests for them to make sure they're behaving like the real thing, particularly if the system/component is likely to change regularly.

abstract class StorageTest extends \PHPUnit_Framework_TestCase
{
    abstract protected function getStorage(): Storage;

    /**
     * @test
     */
    public function what_goes_in_must_come_out()
    {
        $targetPath = "/some/path";
        $contents = "the contents";
        $storage = $this->getStorage();

        $storage->put($targetPath, $contents);

        $this->assertEquals($contents, $storage->get($targetPath));
    }
}

Subclassing this class and providing S3Storage, ArrayStorage, DiskStorage and VCRStorage instances in the getStorage method enables us to run the same tests against the different implementations. Adam Wathan has a nice screencast on this if you fancy watching how easy he makes it look in just 10 minutes.

Fake Systems with In Memory Backends

Sometimes, if something is particularly cross cutting and difficult to isolate, it can be hard to replace the client code or object with a fake. In this instance, it is sometimes possible to replace or change the whole system in the backend to speed up our tests.

If you need to shift large amounts of data in your databases, it might be worth keeping the client code the same, but switching out to a memory based backend.

MySQL comes with a memory storage engine, but I prefer create a ramdisk and configure MySQL to keep it's data there.

davem@wes:~$ mount  | grep ramdisk
tmpfs on /tmp/ramdisk type tmpfs (rw,nosuid,nodev,relatime,size=1048576k)
Completely Fake systems

Another quick win can be to replicate the whole third party system with a local equivalent.

Can you run your tests against SQLite rather than your full blown RDBMS? SQLite can be given a URL to tell it to store everything in memory.

$conn = \Doctrine\DBAL\DriverManager::getConnection([
    'url' => 'sqlite:///:memory:',
] , new \Doctrine\DBAL\Configuration());

Amazon provides a downloadable version of DynamoDB, which you can run locally for your tests avoiding the latency of making calls across internet. There are also a bunch of compatible implementations of other AWS services to be found on github, though your mileage may vary.

Wrapping up

If something is slowing your test runs down, make it quicker. If you can't make it quicker, replace it.

This is the first in a series of posts describing how you can go about making test runs faster, I'll be back to update this post as more posts in the series get published. If you'd like to be notified, pop your email address in the box below.

Faster tests in PHP
  1. Avoiding latency with Fakes
  2. Organising Test Suites

« Get rid of -m - Stefan Koopmanschap

Chris Hartjes - From macOS to Windows 10 - Part 1 »