Peter Lyons Logo

Peter Lyons

Code Conventions

Code conventions

This page will document conventions I use when writing computer code. Hopefully folks will find them helpful. I will try to explain the reasoning behind each convention clearly. I have found the rationale omitted from many coding convention guidelines and found that to be frustrating.

See also Google's Python Style Guide. Overall it is spot on and goes into good detail and examples. There are a few points I disagree with, but they are not super important. My main gripe is there are some points they assert without explaining the underlying reasoning.

Guiding Principles

All these conventions reinforce certain core tenets.

Readability is King
Most of these conventions are about making the code maintainer's job as fast and easy as possible. The faster and more accurately the code maintainer (whether it's the same person as the author or not) can change the code, the better. A program that runs perfectly but is unchangeable because it has low readability is significantly less valuable than a clean and clear program with some minor bugs that can easily be found and fixed. The fact is, if your code is hard enough to read, someone will eventually decide to rewrite it, giving you negative return on the time invested in the first version of the program.
Precision
Regardless of whether you like to think of code as more like art or more like science (I prefer the latter), code needs to be accurate and precise. There should not be careless code or comments strewn around the code. Code should look like it was created by an engineer. And not like the engineer who probably wrote it - an overtired kid scarfing junk food over a messy desk late at night; it should look like it was written by an old school 1950s chemist in a pristine lab coat who carefully labels every vial in his lab before his fills it. Even if you fancy yourself a "hacker", I think most of us agree that beautiful code just sparkles with precision and clarity and elegance.
Make One Choice
Many programming laguages support several different ways or syntaxes for the same thing. This is unfortunate, in my opinion, especially for the silly ones. Avoid these silly syntax variations and favor consistently using the most common format. This is also expressed in [PEP 20][2]'s "There should be one-- and preferably only one --obvious way to do it.".
When In Doubt, Alphabetize
For a sequence of statements where execution or declaration order doesn't matter, if the list is small and very clearly can be organized logically, do that. But if the list is long or has no very clear inherent organization, alphabetize
Think In Small Chunks
People have varying mental capacities to keep things in their short term memory. Certain complex or compound statements urge the reader to fill up their short term memory with lots of intermediate products in order to comprehend a single complex statement. I find this difficult and especially frustrating when I'm reading someone else's code. And the code is broken. And I'm tired and up late because that code is broken. And the author decided to write some fancy 200-character lambda expression with seven intermediate variables. Here's a real example I encountered at work (altered to protect the guilty): elif svd['method'] == 'some.literal.string' and filter(lambda x: type(x) == type((0,)) and x[1], svd.results.get('test_totals',{}).items()): That's a single expression! Completely unreadable to me. Clever, but worse than worthless.
Fewer Expressions Per Line
Error messages often include a line number. However, if you get to that line of source code and it is 200 characters long and contains ten complex sub-expressions, you may need to break it up into ten lines of code and rerun it to understand which expression causes the error. If the error is hard (or impossible) to reproduce, you are in for some guesswork.

Thoughts on naming

Clear naming is absolutely critical, in my opinion. This applies very broadly: names of products, projects, directories, files, classes, methods, functions, variables, modules, packages, etc. Clear names make all the difference. I often will spend ten minutes thinking about the best name for a key class or method. The fact is, naming is something you can't avoid. You can get away without writing comments or documentation, but every file needs a name and so does every variable. Therefore, the absolute minimum you can do is make the names clear. And it goes a long, long way. Conversely, naming that is confusing or unclear from the beginning, or that becomes confusing through a refactoring without the accompanying renaming, is wasteful. The maintainer is going to waste time (and therefore money) acting on confused assumptions based on your bad or broken names. If you have a variable called serverIP, which initially contains just a string IP address in dotted quad notation, and then later you refactor the code so this variable contains ip:port, you need to rename the variable to serverIPPort. It's worth the effort to keep the code straightforward and not full of nasty surprises and tricks.

See also Andy Lester's article on the two worst variable names.

General Guidelines

Python Conventions

For the most part, I follow PEP 8, so review that and follow it for the basic formatting stuff. See also PEP 20. Note that python's convention for module names being all lowercase supercedes my guideline about acronyms always being capitalized.

One brief aside here regarding the "A Foolish Consistency is the Hobgoblin of Little Minds" section of PEP 8. I feel it is worth noting that even though when it comes to formatting and style I do tend toward the extreme of consistency, but hopefully not past that into foolishness. However, when it comes to the actual python standard library itself, there is no such thing as foolish consistency. Even in PEP 8 they admit "The naming conventions of Python's library are a bit of a mess". The python standard library is riddled with blatant inconsistencies that reveal that we are dealing with a product of dozens of authors and pretty bad consistency (much worse that Java in many cases). Examples abound, but just look at os.mkdir() vs. os.makedirs. I have so many times typed os.mkdirs() only later to get a AttributeError. I mean, WTF? It's in the same module for crying out loud. I have my opinion about how this should be (os.makeDir() that behaves like os.makedirs()), but I don't care that much as long as they are consistent. If a library is consistent, I'm flying. At this point I rarely need to read documentation. I can use most common libraries for IO, date, filesystem, networking just by looking at the API and assuming it does what makes sense. If there is no consistency though, it totally gums up the works and slows me to a frustrating crawl.

When building lengthy inline data structures such as dictionaries or lists, prefer multiple statements (separate initialization and population code) to overly long inline data structures. This adheres to the Fewer Statements Per Line principle. For example,

original:

_platformCfg = {
                "FedoraLinux" :
                { "releases"     : {"1":0,
                                    "2":0,
                                    "3":0},
                "longName"     : "Fedora Core Linux",
                  "dfCmd"        : "df -k",
                  "sttyUnset"    : "stty noflsh echo",
                  "netstatCmd"   : "netstat -na | grep ":%s " | grep LISTEN",
                },
                "RHLinux" :
                { "releases"     : {"6.2":1,
                                    "7.1":1,
                                    "7.2":1,
                                    "7.3":1,
                                    "8.0":1,
                                    "9":0,
                                    "2.1WS":1,
                                    "2.1ES":1,
                                    "2.1AS":1,
                                    "3WS":1,
                                    "3ES":1,
                                    "3AS":1,
                                    "4WS":1,
                                    "4ES":1,
                                    "4AS":1},
                "longName"     : "Red Hat Linux",
                  "dfCmd"        : "df -k",
                  "sttyUnset"    : "stty noflsh echo",
                  "netstatCmd"   : "netstat -na | grep ":%s " | grep LISTEN",
                },
                "SuSELinux" :
<REMAINING OMITTED FOR BREVITY>

preferred:

_platformCfg = {}
_fedoraCfg = {}
_fedoraCfg["releases"] = {"1": 0, "2": 0, "3": 0}
_fedoraCfg["longName"] = "Fedora Core Linux"
_fedoraCfg["dfCmd"] = "df -k"
_fedoraCfg["sttyUnset"] = "stty noflsh echo"
_fedoraCfg["netstatCmd"] = "netstat -na | grep ":%s " | grep LISTEN"
_platformCfg["FedoraLinux"] = _fedoraCfg

_rhCfg = _fedoraCfg.copy()
_rhCfg["releases"] = {"6.2": 1, "7.1": 1, "7.2": 1, "7.3": 1, "8.0": 1,
     "9": 0, "2.1WS": 1, "2.1ES": 1, "2.1AS": 1, "3WS": 1, "3ES": 1,
     "3AS": 1, "4WS": 1, "4ES": 1, "4AS": 1, "5SERVER": 1, "5CLIENT": 1}
_rhCfg["longName"] = "Red Hat Linux"
_rhCfg["netstatRe"] = "tcp.+:%s.+LISTEN"
_platformCfg["RHLinux"] = _rhCfg

Why?

  1. Second version does not duplicate constant values, making it easier to change them in one place and be done with it. (Don't Repeat Yourself)
  2. Second version is more expressive. It clearly indicates that you are copying all the data for one key and then just changing some values. The inline literal version requires you to eyeball all the data to attempt to make that determination.
  3. As a general rule, I prefer more simple statements over fewer complex/compound statements since they require less working memory in your brain (Think In Small Chunks)

Java Conventions

Bourne Shell Conventions

Ruby Conventions

CoffeeScript Conventions

Comments

This article pre-dates my blog, but you can post any comments you have on this article on the corresponding entry on my technology blog.