Adventures in GitHub Project Automation

Published 2020‑09‑05 Last Modified 2021‑08‑01

This is part of the story of un-fstring, a toy project I built to experiment with Python syntax parsing and GitHub project automation. In this post I'll touch on why I built it and the automation tools I used on it. A future post might cover the actual transformations it does to Python source code.

Origins

A few weeks ago, I ran into a small annoyance at work: I had written some code that was only compatible with Python >=3.6, but later needed it to be compatible with Python 3.5. There are a few syntax differences between 3.5 and 3.6, and the most annoying to fix by hand is that Python 3.6 introduced f-strings. I prefer f-strings over both % and .format(), so I tend to default to them, and thus had to convert them by hand to .format() calls when making the code 3.5-compatible.

I've been interested in project-level automation recently, sparked by finally getting around to installing pre-commit in a few projects and feeling how it removed some nagging annoyances, particularly around boring tasks like making sure that black had been run before committing. It also unlocks some neat tricks like running black on code examples in your documentation, something that isn't easy to do on the command line, if you even remembered to do it. I've been wanting to write a pre-commit hook for a toy project to understand the process.

I've also had my eye on this Python Packing Authority guide, which describes how to set up a GitHub Action that publishes a Python package to PyPI. I've been wanting to set this up for dask-chtc and htmap, but I've wanted to test it out on a low-risk personal project first since PyPI version numbers can't be overwritten, making a mistake on a real project fairly embarrassing.

If you want a robust tool for doing this transformation, I recommend f2format. un-fstring was more about learning than producing a useful tool, and I don't recommend relying on it.

Workflow for pytest

GitHub Actions lets you define a workflow in terms of the jobs (each composed of steps) that run when it triggers. The YAML file below defines a workflow named "tests". When placed in .github/workflows/ in a GitHub repository, it will attach that workflow to that repository. Workflows are most often used for continuous integration (i.e., commit often, and test each commit) and continuous deployment (i.e., automatically deploying/packaging certain commits).

.github/workflows/tests.yml
name: tests

on:
  push:
    branches:
      - master
  pull_request:

jobs:
  test:
    strategy:
      fail-fast: false
      matrix:
        platform: [ubuntu-latest, windows-latest, macos-latest, ubuntu-20.04, ubuntu-18.04]
        python-version: [3.6, 3.7, 3.8]

    runs-on: ${{ matrix.platform }}

    steps:
      - uses: actions/checkout@v2
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v2
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install package
        run: pip install .[tests]
      - name: Run tests
        run: pytest tests/

This workflow triggers when there is a push to the master branch or when the very-generic pull_request event fires (this include pull request creation as well as pushes to pull request branches).

The workflow has a single job, called "test". That job parametrically expands to run over a "matrix" of platform and python-version values. Unlike other CI systems I've used like Travis-CI, this matrix is just defining key-value pairs: you re-use the values later yourself to build up the environment, instead of some of them being hard-coded parts of the system's environment setup.

Finally, we get to the steps. Each step can either run a command, or uses an "action", a pre-packaged set of steps (why run and uses instead of run and use or runs and uses? Go figure...).

At this point, you're basically writing a script, with some extra functions provided through actions. This gives you a great deal of flexibility. We use the checkout action to git clone the source code (it finds the right commit based on environment variables set by GitHub), the setup-python action to install the Python version determined by the matrix, and then a few run actions to pip install the package and execute the pytest test suite. If any step has a non-zero return code, GitHub Actions will stop executing that branch of the matrix and mark it failed, with all of the notifications and checkmarks in the pull request view that you might hope for.

Workflow for pre-commit

The pre-commit workflow is extraordinarily simple to set up, since the template in the action's README just works out of the box:

.github/workflows/pre-commit.yml
name: pre-commit

on:
  push:
    branches:
      - master
  pull_request:

jobs:
  pre-commit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
      - uses: pre-commit/[email protected]

I like to keep the pre-commit job in a separate workflow from the tests, since the tests need a matrix to cover multiple platforms but pre-commit does not. Each GitHub account gets some number of included minutes for running Actions per month, after which you need to pay. You can see your current usage on this page.

Workflow for Publishing to PyPI

The Python Packaging Authority (PyPA) maintains an Action for publishing packages to PyPI. The PyPA has a guide with some useful snippets based on their Action here, which my take below is based on.

.github/workflows/publish-to-pypi.yml
name: publish-to-pypi

on:
  release:
    types: [published]

jobs:
  build-and-publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python 3.x
        uses: actions/setup-python@v2
        with:
          python-version: "3.x"
      - name: Install build dependencies
        run: pip install wheel
      - name: Build packages
        run: python setup.py sdist bdist_wheel
      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-[email protected]
        with:
          user: __token__
          password: ${{ secrets.pypi_token }}

To let the Action authenticate as you, you should make a PyPI API token from the PyPI account management page, then store that token as a secret in your repository. You should always make a token: don't use your PyPI username and password directly, since that will break if you change your password. Token-based authentication also lets you restrict access to a single PyPI project, reducing the impact if your token somehow leaks. The ${{ secrets.pypi_token }} syntax accesses the GitHub repository secret; I happened to name it pypi_token.

The steps before the actual publish step are a standard setup.py-based package build; see the PyPA guide for details on the workflow. The only surprising thing was that wheel, required to build wheels, wasn't installed, but a quick pip install wheel fixes that.

The trickiest part of this was actually deciding when to run this workflow. I decided to run it when a release is published, which implicitly targets the last commit in the tag the release is based on. There are a lot of options here, and it should be easy to play some tricks here, like doing pre-releases for specially-named branches. One gotcha is that the PyPI push will fail if you forget to update the version number in setup.cfg before making the release... which I did several times while playing around. Luckily, GitHub releases are not immutable, so you can delete the release and the tag it points to, update the version number, then redo the release creation process with the same name.

Making un-fstring into a pre-commit Hook

Configuring the pre-commit hook is actually the easiest "automation", since pre-commit just needs to be able to get a few pieces of metadata from the public GitHub repository. That metadata lives in a .pre-commit-hooks.yaml file in the repository root:

.pre-commit-hooks.yaml
- id: un-fstring
  name: un-fstring
  description: Convert f-strings to .format() calls.
  entry: un-fstring
  language: python
  types: [python]

Most of the metadata really is just metadata, but some do have real runtime behavior: language is the language the hook is written in, and types says what files to run the hook on.

The other requirement is that pre-commit will want to point at a certain version of the repository. Since we already have a version/tag-based workflow for PyPI releases, there's no extra work to be done here.

Why Bother?

Why not bother?

Setting up project automation like pre-commit and continuous integration/deployment workflows is becoming less frustrating as the tooling improves (GitHub Actions is a huge improvement over Travis-CI, for example, though I don't want to get into a religious war here). Setting up these workflows, at least for basic tasks, typically only takes a few hours or less (it took longer to write this post than to set up all of the project automation it describes).

Now, I can commit and release with confidence. I don't have to worry that I'll merge a pull request that's broken on a platform I don't test on locally, or that I'll accidentally package a new release locally without making sure I'm on the latest master, or that someone else will forget to run black on their pull request.

Of course, it's always possible for something to slip through the cracks. Automation won't protect you from something you don't test - all it can do is make sure the test gets run. But that's certainly better than the alternative.