Managing Per-Project Interpreters and the PATH

September 02, 2012

Updated: 2016-04-14

So let's talk about managing runtimes and interpreters for projects and applications. This post comes about after I have seen a vast jungle of non-solutions and dead ends out there for managing installations of interpreters such as Python, Ruby, node.js, etc. First, let's clear the air of a bunch of nonesense you may find out there that makes this problem confusing.

Never use the interpreter provided by your operating system

If you are building a web application or any other project that should by all rights be cross-platform and doesn't ship with OS, then you should have absolutely nothing to do with the interpreter that may be included with the OS. The OS interpreters are there for components of the OS itself written in those languages. They have no business being used by third party projects and applications that run on top of the OS as opposed to being part of the OS. Forget about the Ruby that comes with OS X. Forget about the Python that comes with Ubuntu. Forget about the Debian packages for node.js (slightly different/better, but still, ignore them).

The reasoning behind this guideline is as follows.

Exact version: Applications need exact and strict control of the version of their interpreter. You should be using the exact same version of your interpreter across all of your development, test, staging, and production environments. This will avoid problems which are easily-avoidable, so do it.
Modern version: OSes tend to ship versions of these interpreters that are significantly behind the latest stable version. New applications should be written to work with the latest stable version and should keep up with ongoing releases, never getting more than 3 months behind.
Independence Applications need independence from one another. If you have 3 Django projects on the same machine, each one needs to have the ability to use whatever interpreter version it needs on its own independent schedule. Due to this fact, that means the correct location for these interpreters is within your application's directory alongside your application code, which is why I advise you to ignore the node.js debian packages you may find out there because it installs into a shared location, which does not meet our goals here.

Keep the app-specific interpreter within the application install directory

Again, don't let the OS's notion of shared interpreters in a shared location distract you from the right layout here. The app-specific interpreter installation belongs inside you project's installation directory.

project_root/python
project_root/node
project_root/ruby

Basically, the old school unix principles have gone stale on us. Years ago, sysadmins had rules for filesystem layout with different goals. For example, sysadmins wanted to be able to NFS mount a shared directory where binaries could live, be maintained in a single place, and be mounted and run by many additional servers. They wanted to do this to be efficient with disk space and to be able to make manual changes in one place and have them to affect immediately on an arbitrary number of servers that use the same NFS volume.

Now we use automated tools to manage deployments to clusters of independent servers, and disk space is cheap, so we want each server to have its own copy of what it needs to run with as few external dependencies as possible. We want to be able to do rolling deploys across a cluster or run 1/2 the cluster on the new code and half on the old code. Disk space is cheap and plentiful, so if we have 5 or 10 apps running on the same staging server, we could not care less about a few megabytes of duplication to handle a bunch of python installations.

OR use tools like nvm and rbenv

Some but not all of the interpreter managers are fully compatible with the requirements outlined above, so it's OK to use nvm or rbenv as long as each project gets to specify it's exact version.

Never use npm -g

This is a follow up to my earlier blog post about avoiding npm -g, now improved and revised. For the most part, I believe npm to be the state-of-the-art package management system and to be superior to the messes available for python and ruby. However, the -g switch, which installs commands globally, should be avoided in favor of the system described here. You don't want to have to upgrade all your apps at once to a new version of eslint, so give them each their own copy of the eslint command.

Provide a single script to launch your application commands

Encapsulate each version with a wrapper shell script that understands the project directory layout and manages your PATH appropriately. I tend to call this file project_root/bin/go but project_root/bin/tasks.sh or similar are good locations for this. This script should handle your service operations like start, stop, reload, etc, as well as any one-off commands you make have like clearing a cache, regenering static files, and so forth.

Here's a snippet of my project_root/bin/go script which locates the correct installation of python and fabric and passes control to them.

#!/bin/sh -e
cd $(dirname "${0}")
exec ./python/bin/fab "${@}"

Thus I can run this script from any directory, or from an init/upstart script, with any PATH, and the application correctly handles its own required settings. The above is the bare bones and the crux of the separation of concerns in the design. I normally have some other code in there to bootstrap the project's dependencies, but I'll save that topic for another blog post.

For local development, manage your PATH intelligently and automatically

As you work on many projects which contain their own interpreter installations, you don't want to always have to A) work from the project root directory and B) run commands like ./python/bin/python myapp.py. So here are some utilities that can intelligently manage your PATH similar to what is done by rbenv, but not tied to ruby and based on you changing project directories.

First, here's how I set up my PATH in my ~/.zshrc file (works equally well for bash or bourne shell). I've added extra explanatory comments inline.

#This helper function will add a directory to the PATH if it exists
#This is a simple way to handle different machines, OSes, and configurations
addPath() {
    if [ -d "${1}" ]; then
        if [ -z "${PATH}" ]; then
            export PATH="${1}"
        else
          export PATH=$PATH:"${1}"
        fi
    fi
}

setup_path() {
  PATH=
  # Normal system stuff comes first for security
  # So npm packages can't override basic commands like ls

  # Homebrew
  add_path "/usr/local/bin"

  add_path "/bin"
  add_path "/usr/bin"
  add_path "/sbin"
  add_path "/usr/sbin"
  add_path "/usr/X11/bin"

  # Personal home dir stuff
  add_path "${HOME}/projects/dotfiles/bin"
  add_path "${HOME}/bin"

  # Local pwd stuff
  add_path "${PWD}/script"
  add_path "${PWD}/bin"

  # For node
  add_path "${PWD}/node_modules/.bin"
  add_path "${HOME}/shared_node.js/node_modules/.bin"
  add_path "${HOME}/shared_node.js/node/bin"

  # For per-project python virtualenvs
  add_path "${PWD}/python/bin"
  add_path "${PWD}/env/bin"

  add_path "${HOME}/.rbenv/bin"

  export PATH
  [[ -d "${HOME}/.rbenv/bin" ]] && eval "$(rbenv init -)"
}
# Run this during shell startup.
# Can be re-run as needed manually as well
setup_path

OK, so that's how the PATH gets built up, but we want to change the PATH as we move our current working directory between projects. For that we use a shell hook function. What this does is try to detect if we've changed into a project directory, and if so, rebuild the PATH, which will put our project-specific directories early in the PATH list, so when we type node or python or coffee, etc, we get the project specific one under the project root. Because this adds absolute paths and only changes the PATH when we cd to a project root, we can cd to subdirectories within the project and still be running the correct project-specific interpreter. This does breakdown, however, if you cd directly into a project subdirectory without stopping in the project root. I don't hit that problem because I'm not in the habit of doing that, but YMMV. Here's the zsh version, which uses the chpwd hook function.

if [ -n "${ZSH_VERSION}" ]; then
  chpwd() {
    [ -d .git -o \
      -d  node_modules/.bin -o \
      -d python/bin -o \
      -d node/bin ] && setupPath
  }
fi

Bash users, you're on your own.

Here's an example of this at work.

~-> cd projects/peterlyons.com
~/projects/peterlyons.com-> which node
/Users/plyons/projects/peterlyons.com/node/bin/node
~/projects/peterlyons.com-> cd ../craft
~/projects/craft-> which node
/Users/plyons/projects/craft/node/bin/node
~/projects/craft-> cd ../othenticate.com
~/projects/othenticate.com-> which node
/Users/plyons/projects/othenticate.com/node/bin/node
~/projects/othenticate.com-> cd ../m-cm/indivo_provision
~/projects/m-cm/indivo_provision-> which python
/Users/plyons/projects/m-cm/indivo_provision/python/bin/python
~/projects/m-cm/indivo_provision-> cd ./conf
~/projects/m-cm/indivo_provision/conf-> which python
/Users/plyons/projects/m-cm/indivo_provision/python/bin/python

(Bonus item) Keep variable files for data and logging under your project directory

project_root/var/log
project_root/var/data

This is a mindset shift from traditional unix administration best practices. It's in my opinion a less complex and more application-centric design that makes better sense given our focus on applications that tend to be providing network services and generally are less tightly coupled to the underlying OS these days. Traditional unix administration (as documented in the Filesystem Heirarchy Standard) has a strong and system-wide distinction that runtime variable data like data files and log files go under /var and everything else except for /home and /tmp is static data. Again, this no longer applies to modern applications. These rules had to do with preventing key filesystems from filling up, primarily. They wanted application data to be static and allocate a certain amount of space that had separate filesystem limits from the variable data, which they wanted organized centrally under /var so they could manage log file growth and space disk space centrally. There were reasons for these designs at the time that made sense given the constraints and goals, but times have changed.