Seibel, James Hague and others have all tried to justify why code reading is so uncommon, and they make good points. But perhaps the conversation is led astray by use of the word read. I wonder if Abelson and the others would have had more examples if Seibel had asked them what code they had learned about for fun. Perhaps the word “read” put them in a passive frame of mind, caused them to filter out programs they’d hacked on?
We all read code already; it’s just that we usually read when we want to edit. And the comprehension that questions about reading really focus on comes from both reading and writing, interleaved in complex ways. (Outside of programming, Cal Newport has called this the “make your own textbook” method.)
That hacking produces better comprehension than passive, linear reading fits with what we know about learning. Barbara Oakley, Herbert Simon, Cal Newport, and Anders Ericsson all describe how solid understanding emerges from active exploration, critical examination, repetition, and synthesis. Hacking beats passive reading on three out of four of these criteria:
- Active exploration: When you hack, you want to eventually produce a change in the codebase. This desire guides your path through the code. When you read passively you let the code’s linear flow guide you.
- Critical examination: When you hack, you evaluate existing code in light of the change you want to make. Deciding what to use, remove, and keeps you from accepting the existing system as canon. When you read linearly, you lack a goal against which you can critically examine the existing code.
- Synthesis: To change the program as you desire, you synthesize existing code with new code.
- Repetition: Neither hacking nor linear reading involve useful repetition, unless you treat your change to make like a kata and mindfully re-implement it multiple times.
Learning through hacking also leverages the natural structure of a codebase. Good books guide their readers through series of questions and their answers, whereas codebases distribute answers to external questions throughout their structure. In its non-linearity, a codebase resembles a map. You can ask an infinite number of questions of a map. How far is it from A to B? Which is the nearest town to C? But you can’t expect a map to tell you what questions to ask, and it makes no sense to read a map linearly from top to bottom, left to right. Yet we still tell people to read codebases linearly even though we would never tell them to do this for maps.
Future solutions and recommendations for code comprehension should focus on providing good questions and changes to guide exploration, making it easier for readers to answer their own questions about codebases, and encourage active engagement over passive reading. While I’m much less sure of how to do this than I am that it’s a good idea, here are a few preliminary thoughts:
- Let newcomers learn your codebase by suggesting they try and implement already made changes as katas. Start them at the Git commit that preceded the change, give them hints where necessary, and link them to the actual change plus others’ attempts at producing it.
- Rather than treating documentation as something that explains individual modules, focus on providing maps (like Fabien Sanglard’s for Git) that help future hackers understand the big picture enough to change things within it. At the very least, don’t fault people who want to contribute to your codebase for not reading your code line-by-line.
It turns out others have explored similar ideas that I explore here from different angles:
- “How To Be A Hacker”: Eric Raymond discusses what he calls the “the incremental-hacking cycle”, a process by which someone gradually expands their understanding of a codebase by making bigger and bigger changes to it.
- “How to read math textbooks”: David Maciver describes a problem- and theorem-driven approach for learning math, which you could adapt to reading programs.