In my consulting work, I encounter a pretty high quantity and diversity of codebases as client projects flow in and out. For many reasons, it is very common for a project to get increasingly messy with time, and often by the time it drops in my lap, it's a certifiable mess.
Dealing with a mess can be a huge drag and productivity drain as even the simplest debugging investigation is rife with confusion, duplication, misdirection, etc. Here's a few quick tools tips to help get things organized and clean again.
If the overall code style is highly inconsistent or hard to read, it might make sense to hit it with a giant esformatter hammer and just force everything into a single style. The go programming language does this language-wide via gofmt. Keep these tips in mind:
- minimize this risk by autoformatting in small batches and making sure any tests you have continue to pass
- esformatter and similar tools are still pretty young and could have bugs that change your code in bad ways
- always carefully inspect the diffs before committing automatic changes
- Do the formatting in small batches and group changes into a large number of small git commits
- after every change, confirm via unit tests if you have them and eslint that the code is still not obviously broken
- keeping the git commits granular will enable you to more effectively bisect the code to track down a specific problem introduced by one of these changes
- Do all the work on a branch, and do no manual code changes of any type on this branch
- For example, don't mix some manual variable renames with automated esformatter changes
- You want to be able to discard the branch entirely without losing and human-authored changes
- Get the project done quickly
- This is a complete recipe for merge conflicts and you'll never get it done doing small bits at a time while active development is happening concurrently. If you need to, just declare a small moratorium on development for a weekend or whatever and get everything formatted and merged before new feature development continues
I covered eslint in some detail in my previous post. Once you've got the code reasonably formatted and consistent, throw eslint at it to get a sense of where bugs and issues may lie. Often times you may be getting hundreds or thousands of errors and warnings. I think focusing on either the most frequent errors (codebase improves the most by fixing these) or least frequent (fixing these can get you to OK on a specific eslint rule quickly) are reasonable approaches. The key thing here psychologically for me is to not get overwhelmed and frustrated and hopeless.
One nice plugin that can help with initial analysis, triaging, and scheduling is eslint-stats which can make it easy to see which problems are most common.
True story. A client's codebase was uglified then autoformatted by a previous developer before delivery to the client as "source code". Thus at a glance it looked like source code as it had newlines and indenting, but all the variable names had been minified to single letters. This made it extremely difficult to read. One tool that helped us gradually get back to sanity was beautify with words which finds all those 1-letter variable names and generates a longer, unique, pronounceable (but otherwise gibberish) variable name for them. After that you can easily find and replace all once you understand what an appropriate semantic name for the variable is.
account, we could do:
grasp '#user' --replace account
My workflow with grasp is as follows.
- has to be valid JS to work with grasp. I usually grab an entire function declaration or conditional block
- copy it into the clipboard
- run grasp in the terminal:
pbpaste | grasp '#user' --replace account | pbcopy
- this pastes the text into grasp's stdin, and copies grasp's stdout back into the clipboard
- back in my editor just paste the results in, replacing the still-selected original snippet
Keep It Clean
Hopefully these tools will help you out in the wild cleaning up messy codebases!