Skip to content

Instantly share code, notes, and snippets.

@Integralist
Last active November 27, 2025 13:04
Show Gist options
  • Select an option

  • Save Integralist/7944948 to your computer and use it in GitHub Desktop.

Select an option

Save Integralist/7944948 to your computer and use it in GitHub Desktop.
Sandi Metz advice for writing tests

Rules for good testing

Look at the following image...

...it shows an object being tested.

You can't see inside the object. All you can do is send it messages. This is an important point to make because we should be "testing the interface, and NOT the implementation" - doing so will allow us to change the implementation without causing our tests to break.

Messages can go 'into' an object and can be sent 'out' from an object (as you can see from the image above, there are messages going in as well as messages going out). That's fine, that's how objects communicate.

Now there are two types of messages: 'query' and 'command'...

Queries

Queries are messages that "return something" and "change nothing".

In programming terms they are "getters" and not "setters".

Commands

Commands are messages that "return nothing" and "change something".

In programming terms they are "setters" and not "getters".

What to test?

  • Test incoming query messages by making assertions about what they send back
  • Test incoming command messages by making assertions about direct public side effects

What NOT to test?

  • Messages that are sent from within the object itself (e.g. private methods).
  • Outgoing query messages (as they have no public side effects)
  • Outgoing command messages (use mocks and set expectations on behaviour to ensure rest of your code pass without error)
  • Incoming messages that have no dependants (just remove those tests)

Note: there is no point in testing outgoing messages because they should be tested as incoming messages on another object

What to Mock/Stub

Command messages should be mocked, while query messages should be stubbed

Contract Tests

Contract tests exist to ensure a specific 'role' (or 'interface' by another - stricter - name) actually presents an API that we expect.

These types of tests can be useful to ensure third party APIs do (or don't) cause our code to break when we update the version of the software.

Note: if the libraries we use follow Semantic Versioning then this should only happen when we do a major version upgrade. But it's still good to have contract/role/interface tests in place to catch any problems.

The following is a modified example (written in Ruby) borrowed from the book "Practical Object-Oriented Design in Ruby":

# Following test asserts that SomeObject (@some_object) 
# implements the method `some_x_interface_method`
module SomeObjectInterfaceTest
  def test_object_implements_the_x_interface
    assert_respond_to(@some_object, :some_x_interface_method)
  end
end

# Following test proves that Foobar implements the SomeObject role correctly
# i.e. Foobar implements the SomeObject interface
class FoobarTest < MiniTest::Unit::TestCase
  include SomeObjectInterfaceTest

  def setup
    @foobar = @some_object = Foobar.new
  end

  # ...other tests...
end
@dgmstuart
Copy link

Right - I think I understand - thank you for constructing this stub.

What I'm concluding is that in order to force me to write a valid implementation (in my TDD loop), I need to pass a stub where the output depends on the input.

Perhaps there's some rule of thumb that if the output of a class (eg. WeatherAPI) depends on its input, then a stub for that class also needs to depend on its input?

Consider the following, which implements the interface just fine, but doesn't have the property we want?

class HardcodedWeatherStub implements ProvidesWeather {
  async weather(_city) {
    return "rain";
  }
}

(for clarity this is more or less what instance_double(WeatherApi, weather: :rain) creates: even though Ruby doesn't have interfaces, instance_double checks that the method exists on the real class. It doesn't check types though, since Ruby doesn't have types either and by convention uses ducktyping and polymorphism)


I see the point that if we pass this in our spec:

const stubAPI = new WeatherStub({
  "London": "Rain",
  "Bristol": "Cloudy",
  "Manchester": "Fog"
});

...then we don't need any explicit assertion that the API was called.

...but won't a passing spec have implicitly asserted that the API was called with "London" since there's no other way it could possibly have come up with the correct result? 🙃.

To put it another way, it feels like passing this stub is more or less equivalent to:

  1. use const stubAPI = new HardcodedWeatherStub() (always return "rain")
  2. assert that stubAPI received weather with London (this is easy with testing frameworks in Ruby)

...but maybe there's some key difference that I'm missing?


I 100% agree on not testing the SDK I didn't write, and relying on the interface of the API rather than what it actually does: this was never in doubt.

I still feel like making an assertion on the outgoing query message is only relying on the interface to the WeatherAPI, not on its implementation? But there are of course other reasons to not make such assertions.

@Alpheus
Copy link

Alpheus commented Nov 13, 2025

Right - I think I understand - thank you for constructing this stub.

What I'm concluding is that in order to force me to write a valid implementation (in my TDD loop), I need to pass a stub where the output depends on the input.

Yes, and further: it is important that the stub is async (if your language supports that). The fact it is a time-sensitive is a more important detail than things like api key and url or what it comes from.

Perhaps there's some rule of thumb that if the output of a class (eg. WeatherAPI) depends on its input, then a stub for that class also needs to depend on its input?

Feels like you took a step forward, but now two steps back. The output of a class of type WeatherAPI does not depend on its input. The output depends on whatever the weather server says it does. We don't know that for certain with a guarantee that a machine would consider proof. For all we know it could be returning Rainy all the time and ignore your input (or your input parameter could be wrong).

This detail does not matter for the SUT of what-should-I-wear because no amount of fiddling will prove that it is correct with a production-level certainty.

Consider the following, which implements the interface just fine, but doesn't have the property we want?

class HardcodedWeatherStub implements ProvidesWeather {
  async weather(_city) {
    return "rain";
  }
}

This would not work. The test has to specify the Stub, so the test has to define what it returns when asked, ie. by default "Sunny", and only for London return "rain". Then you implicitly verify it was called correctly if the output is raincoat. So the information (default: Sunny, London: Rain) is part of the assertion and should be right next to it in the same test.

I see the point that if we pass this in our spec:

const stubAPI = new WeatherStub({
  "London": "Rain",
  "Bristol": "Cloudy",
  "Manchester": "Fog"
});

...then we don't need any explicit assertion that the API was called.

Yes, correct! This is on track now, you got it.

...but won't a passing spec have implicitly asserted that the API was called with "London" since there's no other way it could possibly have come up with the correct result? 🙃.

For that you need a zero-case for the behavior for some other form of clothing that isn't rainy. You are right if Sunny -> Raincoat and Rain -> Raincoat. That's why it is important that you compose these tests from sensible base tests (remember the two I added in my original reply?)

To put it another way, it feels like passing this stub is more or less equivalent to:

  1. use const stubAPI = new HardcodedWeatherStub() (always return "rain")
  2. assert that stubAPI received weather with London (this is easy with testing frameworks in Ruby)

...but maybe there's some key difference that I'm missing?

Yes and you picked up on it already. The difference is that the stub needs to exhibit the intended behavior. Let me zoom in all the way:

When you wrote the test originally you said the behavior is Rain -> Raincoat. There is zero location in that. London is not in any shape or form the correct answer. So the simplest test for that is assert(wearing("Rain"), "raincoat").

However, this is now just a simple map, and the information that the weather is async got lost. So the async bit has to come back. Let's compose it.

The inverse isn't super sensible:
whatShouldIwear = fetchWeather(wearing(location))

So we keep the obvious one:
whatShouldIwear = wearing(fetchWeather(location))

So far so good. You can now see that whatShouldIwear is a function of (fetched location). The SDK (or fetch API) are details. So the object becomes

class ClothingPicker
  def initialize(resolvesLocationToWeather: WeatherApi.new)
    @resolvesLocationToWeather = resolvesLocationToWeather
  end

  def wear_a_jacket?(location)
    weather = @resolvesLocationToWeather.weather(1) # incorrect: our real API class expects us to pass a location
    
    case weather
    when :rain
      true
    else
      false
    end
  end
end

RSpec.describe ClothingPicker do
  context "when it's Sunny everywhere in the UK except raining in London" do
    it "returns 'raincoat' for London" do
      api = ... # a stub that has non-rain default + rain in London
      picker = described_class.new(api:)

      expect(picker.wear_a_jacket?("London")).to be(true) # Now it will fail, because giving the resolver 1 will return sunny
    end
  end
end

I still feel like making an assertion on the outgoing query message is only relying on the interface to the WeatherAPI, not on its implementation? But there are of course other reasons to not make such assertions.

Let's unpack this with a counter example. Let's say you put a "cache" in front of your outgoing query which is the WeatherAPI but it has the same interface. That cache returns "Rain" if it cannot access the server (ie. a unit test with an air gap) or if you flip a flag to not talk to external servers (ie. in CI mode).

This is sensible for the weatherAPI, but now none of our tests for ClothingPicker are failing or passing correctly. It also makes the question, "was the API called with London" completely non-deterministic because you don't know whether you got Rain because the location was passed or if you got Rain because the server is down.

This now exposes the core issue: You cannot write the test in such a way, that you could have the API be pick(location) but not know the weather. So the location-to-weather mapping is intrinsically the main information needed, which is exactly what the collaborator provides. The collaborator doesn't "return Rain". It provides "Rain for London", if it is indeed raining in London. It's that second piece of the information that we're mocking, not the mapping itself.

Now you ask, why is it not worth testing? Because if you want any code to execute at all, you need to call the real one. Because if you do not, then the only code executed is the creation of the mock, and the calling of a mock. In this case you fully ignore the fact that something has to be done with raincoat.

So your code becomes (pseudo code):

api = weather_mock_that_asserts_call_with_london_and_returns_rain
picker = new(api) # a class you know calls the api
picker.wear_a_jacket?(london)
#assert the mock was called with London

This breaks the determinism and structure-insensitivity on the test, because test has multiple falses:

  • test breaks if API got called multiple times
  • test passes incorrectly if the weather api changes, but the mock does not
  • test fails incorrectly if the mock changes, but the weather api does not
  • test fails if there are multiple ways to query london, and the one in the mock wasn't the obvious choice (Ie. does not survive refactor)
  • test fails if the value object for the location has multiple canonical forms (ie. trimming whitespaces or lowercasing location)

Ultimately you cannot test the invocation of the method, without screwing up what it does with the return value in any way that would be more reasonable thangiving the SUT a simpler stub through the code.

@dgmstuart
Copy link

Thanks for your thorough response.

The test has to specify the Stub, so the test has to define what it returns when asked
....
So the information ... is part of the assertion and should be right next to it in the same test.

In Ruby we can (and do, with the ubiquitous testing framework RSpec) define such things inline in the test - in this case as

instance_double(WeatherApi, weather: :rain)

I was only writing it as typescript class for the sake of mutual intelligibility, but I see that this was misleading in another way.

Regardless: reading between the lines I interpret your main point as this:

To be a useful stub, it should only return the relevant result in the scenario that we're testing.

If so then that makes sense, and we have a syntax for this in RSpec:

api_stub = instance_double(WeatherApi)
allow(api_stub).to receive(:weather).with("London").and_return(:rain)

This, like WeatherStub:

  • doesn't assert that the message was called
  • will raise an error if "New York" or "london" is passed

This way we avoid the arbitrary values ("Sunny", "Manchester" etc.): I prefer to avoid arbitrary values, because it's not 100% clear to the reader whether it's significant that this specific value is used, or whether any other value would fulfil the same purpose.

@Alpheus
Copy link

Alpheus commented Nov 22, 2025

Thanks for your thorough response.

The test has to specify the Stub, so the test has to define what it returns when asked
....
So the information ... is part of the assertion and should be right next to it in the same test.

In Ruby we can (and do, with the ubiquitous testing framework RSpec) define such things inline in the test - in this case as

instance_double(WeatherApi, weather: :rain)

I was only writing it as typescript class for the sake of mutual intelligibility, but I see that this was misleading in another way.

Regardless: reading between the lines I interpret your main point as this:

To be a useful stub, it should only return the relevant result in the scenario that we're testing.

If so then that makes sense, and we have a syntax for this in RSpec:

api_stub = instance_double(WeatherApi)
allow(api_stub).to receive(:weather).with("London").and_return(:rain)

This, like WeatherStub:

  • doesn't assert that the message was called
  • will raise an error if "New York" or "london" is passed

This way we avoid the arbitrary values ("Sunny", "Manchester" etc.): I prefer to avoid arbitrary values, because it's not 100% clear to the reader whether it's significant that this specific value is used, or whether any other value would fulfil the same purpose.

Yes, to really shorten the conversation down:
The test should contain all details that matter for the particular scenario being tested, and it should not contain any details about the current implementation.

Let's connect that with Sandi's talk from the original idea way up in the thread:

  • Why not test outgoing queries?
  • Because an outgoing query will definitely be tested as a scenario-fixture for one of the command tests or incoming queries

I can see how this can be confusing if you do not pin down tests to each scenario, and use broad-spectrum mocks instead.

@dgmstuart
Copy link

OK thanks again for all the discussion - all the best of luck with your projects.

@Alpheus
Copy link

Alpheus commented Nov 27, 2025

OK thanks again for all the discussion - all the best of luck with your projects.

Thank you too, nice conversation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment