Fluent Interfaces in Python and Ruby

Published 2020-05-25
python, ruby

This post covers one of my favorite fundamental design choices in Ruby: its design favors forward flow of execution in complex statements, and allows programmers to use it in a consistent fashion via blocks. I will contrast this with Python, where the situation is... mixed. Although it is certainly possible to write fluent interfaces in Python, the language and libraries tends to fight you.

Python Tends to Have Mixed Flow

One of my biggest gripes with Python is that APIs are often designed in such a way that forces you to scan forwards and backwards repeatedly when chaining function calls. For example, consider this directory structure:

test/
├── a/
└── b/

To get from A (our cwd) to its sibling directory B using Python's os.path without needing lots of intermediate variables, you need to do something like this:

import os.path

b = os.path.join(os.path.dirname(os.getcwd()), 'b')
# '/home/jtk/test/b'

In English, that's "join the directory that the cwd is in with b".

Notice how the starting point of the calculation, the cwd, comes in the middle of the sentence. If you started there, you would need to read out to the left until you hit the join, then jump back to the end of the sentence to figure out what you're joining to. (The terminology is also somewhat noodly, at least to me - "the directory that the cwd is in" means "the parent of the cwd", but I associate "the directory that something is in" with files more than I do directories.)

If you use Python's pathlib module instead, you get

from pathlib import Path

b = Path.cwd().parent / 'b'
# PosixPath('/home/jtk/test/b')

This is this significantly more compact (no repetition of os.path; most people don't from os.path import the individual functions) and expressive (parent instead of dirname, using / to join path components). Even better, it reads in an obvious manner from left to right: "start in the cwd, go to its parent, then go down into b".

This matches our intuition about how you can traverse the filesystem locally, and reads sensibly from left to right without needing to jump around the structure of the call.

In Python, you can only really achieve this kind of "forward flow" when working with an interface that is explicitly designed to be fluent, like pathlib. Because the Path methods generally return a new path, you can chain them from left to right without having to nest the nouns deep inside a function call. Compare

verb(verb(noun, noun))

to

noun.verb(noun).verb

I vastly prefer the latter.

This is particularly problematic when you want to use Python's map, or even its comprehensions:

# python

doubled = map(lambda x: 2 * x, range(10))

doubled = [2 * x for x in range(10)]

In both cases, the verb comes before the noun. The list comprehension at least reads like an English sentence, but it can get very messy if you want to perform multiple operations on each element, filter or replace elements, or use a more complex expression than is reasonable for a lambda. Well-designed fluent APIs like pathlib certainly exist, but they are the exception, not the rule. There is no driving principle welling up from the design of the language itself to help programmers write fluent interfaces. Instead, Python's own syntax tends to fight you.

Ruby's Design Favors Forward Flow

Ruby flips this around, emphasizing forward flow by making "control flow methods" like map into methods on generic "Enumerable" (in Python, we would say "iterable") objects. In Ruby, the same maps as above look like one of these two equivalent forms:

# ruby

doubled = (0...10).map { |x| 2 * x }

doubled = (0...10).map do |x|
  2 * x
end

In both cases, we are passing a "block" (the bit inside { } or do end) to the map method of a Range object. In Python, one could imagine doing

# python

doubled = range(10).map(lambda x: 2 * x)

... except that range has no such method.. The unifying feature in Ruby is that Enumerable is itself a class with many methods, while Python's iterators are not equipped with any useful methods beyond what is absolutely necessary for them to work. Even the basic for loop can be performed with Enumerable.each in Ruby. Because any code can go in a block, they can act like super-powered Python lambda functions, allowing you to express multi-step operations while maintaining forward flow.

Note that the Ruby version of the map is in noun-verb order, while the original Python examples are in verb-noun order. Noun-verb feels more natural to me - I want to know what we're operating on before I know what the operation is. Of course, that thing might be abstract (an interface, or a duck-typed object, etc.), but I want to understand the context of the action we're about to take.

Mutation is the Enemy

One must be very careful when deciding whether an interface should be fluent. A good way to decide is to look at possible side effects that might happen in the fluent call chain: if there are any, you probably shouldn't provide a fluent interface.

Fluent interfaces are most effective when they describe a chain of transformations, returning a new object from each method in the chain. The examples above are all, well, examples of this. Guido van Rossum sums it up in this Python-Dev post from 2003:

I'd like to explain once more why I'm so adamant that sort() shouldn't return self.

This comes from a coding style (popular in various other languages, I believe especially Lisp revels in it) where a series of side effects on a single object can be chained like this:

x.compress().chop(y).sort(z)

which would be the same as

x.compress()
x.chop(y)
x.sort(z)

I find the chaining form a threat to readability; it requires that the reader must be intimately familiar with each of the methods. The second form makes it clear that each of these calls acts on the same object, and so even if you don't know the class and its methods very well, you can understand that the second and third call are applied to x (and that all calls are made for their side-effects), and not to something else.

I'd like to reserve chaining for operations that return new values, like string processing operations:

y = x.rstrip("\n").split(":").lower()

There are a few standard library modules that encourage chaining of side-effect calls (pstat comes to mind). There shouldn't be any new ones; pstat slipped through my filter when it was weak.

Of course, Python has good support for keyword arguments with optional default values, which largely eliminates the need for the Builder pattern that is popular in some languages. There will always be exceptions where a fluent interface might make sense even with underlying mutation.

Lessons for Everyday Programming

The discussion in this post so far has been rather philosophical. What do I actually try to do while writing code on a day-to-day basis?

  1. Turn complex operations, especially ones that mix "nouns" and "verbs", into single function calls to turn the call site into verb(noun, noun, ...). This keeps the main code path nice and readable (at the expense of shuffling complexity into many small functions, which I consider an acceptable tradeoff).
  2. In Python, use properties liberally to capture simple computations. This inverts the noun-verb order (noun.property instead of compute(noun)) and makes it easier to chain operations. (This is usually good for encapsulation anyway.)
  3. Think carefully before writing a method that mutates and also returns self. Mutating method calls should be isolated to their own line of code, and preferably have a very explicit name. If you simply must mutate in a chained method call, do it at the end of the chain.

These aren't rules, and I don't follow them religiously. But I'd like to think I have them in the back of my mind, especially when refactoring existing code.