Making Clean Code a Part of Your Build Process (And More!)

At Caktus, "clean" (in addition to "working"!) code is an important part of our delivery. For all new projects, we achieve that by using flake8. flake8 is a wrapper around several tools: pep8, pyflakes, and McCabe. pep8 checks to make sure your code matches the PEP 0008 style guidelines, pyflakes looks for a few additional things like unused imports or variables, and McCabe raises warnings about overly complex sections of code.

We usually check code formatting locally before committing, but we also have safety checks in place in case someone forgets (as I more than anyone have been known to do!). This prevents code formatting standards from slowly eroding over time. We accomplish this by making our continuous integration (CI) server "fail" the build if unclean code is committed. For example, using Travis CI, simply adding the following line to the script: section of your .travis.yml will fail the build if flake8 detects any formatting issues (the command returns a non-zero exit code if errors are found):

- flake8 .

You can adjust flake8 defaults by adding a file named setup.cfg to the top level of your repository. For example, we usually relax the 80 character limit a little and exclude certain automatically generated files:

[flake8]
max-line-length=100
exclude=migrations

As a result you not only have code that is more readable for everyone, but avoids actual errors as well. For example, flake8 will detect missing imports or undefined variables in code paths that might not be tested by your unit test suite.

Adding flake8 to an older project

This is all well and good for new projects, but bringing old projects up to speed with this approach can be a challenge. I recently embarked on such a task myself and thought I'd share what I learned.

I started by adding a setup.cfg to the project and running flake8 on my source tree:

$ flake8 --count
...
1798

The result: a whopping 1798 warnings. Many of these turned out to be pep8's "E128 continuation line under-indented for visual indent":

$ flake8 --select=E128 --count
...
1010

In other words, in a huge number of cases, we weren't indenting multi-line continuations the way pep8 wanted us to. Other errors included things like not having a space after commas (E231), or not having two spaces before inline comments (E261). While many editors do support automatically fixing errors like this, doing so manually would still be tedious: In this project we had nearly 250 Python source files.

Enter autopep8 and autoflake. These tools purport to automatically fix pep8- and pyflakes-related issues. There are two ways to use these tools; one wholesale for all errors across the project at once, and one more granular, addressing only a single group of similar errors at a time.

Addressing all errors at once

This approach is best for smaller projects or those with just a few errors (50-100) to address:

$ pip install autoflake
$ find . -name '*.py'|grep -v migrations|xargs autoflake --in-place --remove-all-unused-imports --remove-unused-variables

In my case, this resulted in changes across 39 files and reduced the number of flake8 errors from 1798 to 1726. Not a huge change, but a time saver nonetheless. autopep8 was even more impressive:

$ pip install autopep8
$ autopep8 --in-place --recursive --max-line-length=100 --exclude="*/migrations/*" .

This brought the number of changed files up to 160, and brought the number of warnings from 1726 down to 211. Note that autopep8 also supports an --aggressive option which allows non-whitespace changes. When I tried this, however, it only reduced the number of warnings from 211 to 198. I'll most likely fix those by hand.

Please note: If or when you're ready to try these commands on a project, you must first make sure you have no uncommitted changes locally. After each change, I also recommend (a) committing the results of each command as a separate commit so they're easier to unravel or review later, and (b) running your unit test suite to make sure nothing is inadvertently broken.

Addressing errors as groups (recommended)

While the above approach may work for smaller projects (not this one!), it can make code reviews difficult because all pyflakes or pep8 fixes are grouped together in a single commit. The more labor intensive but recommended approach is to address them in groups of similar errors. My colleague Rebecca Muraya recommended this approach and suggested the groups (thanks, Rebecca!):

  1. First, find and remove any unused imports:

    $ pip install autoflake
    $ find . -name '*.py'|grep -v migrations|xargs autoflake --in-place --remove-all-unused-imports
    

    After it's finished, review the changes, run your test suite, and commit the code.

  2. Now, do the same for unused variables:

    $ find . -name '*.py'|grep -v migrations|xargs autoflake --in-place --remove-unused-variables
    

    Again, once complete, review the changes, run your test suite, and commit the code.

  3. Before moving on to pep8 errors, the following command provides an invaluable summary of errors not yet addressed:

    $ flake8 --statistics --count -qq
    
  4. Finally, autopep8 can be told to fix only certain error codes, like so:

    $ pip install autopep8
    $ autopep8 --in-place --recursive --max-line-length=100 --exclude="*/migrations/*" --select="W291,W293" .
    

    This will remove trailing whitespace and trailing whitespace on blank lines. Once complete, review and commit your changes and move on to the next group of errors.

pep8's error codes are listed in detail on the autopep8 PyPI page and in the pep8 documentation. You can either group them yourself based on your preferences and the particular warnings in your project, or use the following as a guide:

  • Remove trailing whitespace, then configure your editor to keep it away:
    • W291 - Remove trailing whitespace.
    • W293 - Remove trailing whitespace on blank line.
  • Use your editor to find/replace all tabs, if any, with spaces, and then fix indentation with these error codes. This can have a semantic impact so the changes need to be reviewed carefully:
    • E101 - Reindent all lines.
    • E121 - Fix indentation to be a multiple of four.
  • Fix whitespace errors:
    • E20 - Remove extraneous whitespace.
    • E211 - Remove extraneous whitespace.
    • E22 - Fix extraneous whitespace around keywords.
    • E224 - Remove extraneous whitespace around operator.
    • E226 - Fix missing whitespace around arithmetic operator.
    • E227 - Fix missing whitespace around bitwise/shift operator.
    • E228 - Fix missing whitespace around modulo operator.
    • E231 - Add missing whitespace.
    • E241 - Fix extraneous whitespace around keywords.
    • E242 - Remove extraneous whitespace around operator.
    • E251 - Remove whitespace around parameter '=' sign.
    • E27 - Fix extraneous whitespace around keywords.
  • Adjust blank lines:
    • W391 - Remove trailing blank lines.
    • E301 - Add missing blank line.
    • E302 - Add missing 2 blank lines.
    • E303 - Remove extra blank lines.
    • E304 - Remove blank line following function decorator.
    • E309 - Add missing blank line (after class declaration).
  • Fix comment spacing:
    • E26 - Fix spacing after comment hash for inline comments.
    • E265 - Fix spacing after comment hash for block comments.
  • The following are aggressive fixes that can have semantic impact. It's best to do these one commit at a time and with careful review:
    • E711 - Fix comparison with None.
    • E712 - Fix comparison with boolean.
    • E721 - Use "isinstance()" instead of comparing types directly.
    • W601 - Use "in" rather than "has_key()".
    • W602 - Fix deprecated form of raising exception.
    • W603 - Use "!=" instead of "<>"

You can repeat steps 3 and 4 in the above with each group of error codes (or in the case of more aggressive fixes, single error codes) until they're all resolved. Once all the automatic fixes are done, you'll likely have some manual fixes left to do. Before those, you may want to see what remaining automatic fixes, if any, autopep8 suggests:

$ autopep8 --in-place --recursive --max-line-length=100 --exclude="*/migrations/*" .

Once all the errors have been resolved, add flake8 to your build process so you never have to go through this again.

Concluding Remarks

While I haven't finished all the manual edits yet as of the time of this post, I have reduced the number to about 153 warnings across the whole project. Most of the remaining warnings are long lines that pep8 wasn't comfortable splitting, for example, strings that needed to be broken up like so::

foo = ('a very '
       'long string')

Or other similar issues that couldn't be auto-corrected. To its credit, flake8 did detect two bugs, namely, a missing import (in some unused test code that should probably be deleted), and an instance of if not 'foo' in bar (instead of the correct version, if 'foo' not in bar).

My colleague Mark Lavin also remarked that flake8 does not raise warnings about variable naming, but the pep8-naming plugin is available to address this. The downside is that it doesn't like custom assertions which match the existing unittest style (i.e., assertOk, assertNotFound, etc.).

Good luck, and I hope this has been helpful!

New Call-to-action
blog comments powered by Disqus
Times
Check

Success!

Times

You're already subscribed

Times