Sunday, April 06, 2008

Fun With Lists: Between a Programmer's Bookends

I read grokcode's top 9 list of programming books, and I partially agree with it.

What nine and a half books would I recommend for a programmer's bookshelf?

I heartily agree with grokcode's choice of (1) Cormen et al.'s Introduction to Algorithms. I worked for about a decade with scientific/simulation software mostly in Fortran 77. This gave me plenty of exposure to the work product of extremely smart self-taught programmers who never got around to studying even trivial data structures like linked lists, and I say with deep feeling, don't let this happen to you! Reading Introduction to Algorithms, and keeping it handy, is an excellent way to prevent it.

I would add (2) at least one more book oriented toward whatever specialized algorithms and data structures suit your work. Such a second book might be Russell and Norvig's Artificial Intelligence: A Modern Approach, or Okasaki's Purely Functional Data Structures, or Motwani and Raghavan's Randomized Algorithms, or Dechter's Constraint Processing, or Press et al.'s Numerical Recipes. (Incidentally, use the last one with caution: I have heard other people complain about poor judgment calls in it, and I myself wasted more than a week of work in grad school by using a flaky pseudoRNG recommended in the then-current edition. But I don't know any comparably clear and comprehensive overview of the subject, and its potentially-flaky summary advice comes backed with a solid bibliography, so I still consider the book very useful.) Or the second book might be something more specialized: something like Schneier's Applied Cryptography, or Clarke's Model Checking, or a book on compiler construction techniques like grokcode's recommendation of Aho et al. Compilers: Principles, Techniques, or one of various newer compiler books with rather different emphases, e.g., Queinnec's Lisp in Small Pieces.

Grokcode's recommended Structure and Interpretation of Computer Programs is a very good book to study and even to reread, but it isn't the kind of book I actually refer back to with any regularity, and so it got squeezed off my list.

Grokcode's recommended The Mythical Man-Month is also very good, but when limiting myself to nine and a half, I think I prefer some recent books. How about (3) Hunt and Thomas's The Pragmatic Programmer and (4) Fowler's Refactoring? Between them, these two also squeeze out three other basically-worthy books on the grokcode list: Programming Pearls, Code Complete 2, and Design Patterns.

I agree with grokcode's recommendation of (5) The C Programming Language. But I think there should be at least one book on a language with much more support for abstraction than C. (And Structure and Interpretation of Computer Programs isn't my first choice for such a book.) Somewhat more than fifty percent of the code I write is so straightforward that it hardly matters what language it in. E.g., Fortran 77 is in many ways horribly inexpressive, and I worked in it for years and had many periods of intense frustration, but even in f77 many things remain easy to write. However, that's 50+% of code seems to take about 25% of my time. I find the 10-30% of hard code in a project often takes more than 50% of my time, and I find that for such code, choice of programming language can make a considerable difference.

(Language choices should come with strong disclaimers, both "horses for courses" and "de gustibus et de coloribus non disputandum est;" consider it done.:-) In principle, I can easily see the point of choosing C++, Common Lisp, Haskell, OCaml, Scheme, or SML. And in practice, given tradeoffs involving availability of implementations and libraries, I can see the point for various other choices in some cases: e.g., Java, Javascript, Mathematica or Maple, Perl, Prolog, Python, or Ruby. Still and all...

I find I prefer CL for about 70% (weighted by time spent) of projects. (Too bad about all the cruft in CL, too bad about CL being a niche language with the attendant weaknesses in implementation and library support, and too bad that CL's design makes routine program-analysis jobs fundamentally undecidable. More important than those drawbacks is that CL makes it easier to express things OAOO than any other practical language I know. It's also a significant plus that CL does unusually well at implementing hard and soft layers in the same language, and at letting very hairy toolboxes be naturally interactive.) And I find I prefer C++ for about 15% (again, by time spent) of projects. (C++ goes about as far in abstraction as anyone has managed while remaining a systems programming language naturally suitable for implementing OSes and for compiling onto a $0.40 AVR embedded microcontroller. And too bad about all the cruft, again, and about people choosing C++ much too often for complicated applications from about 1985 to 2000, and about how entire books can be written about C++'s idiosyncrasies and outright gotchas. Still and all, C++ libraries written with taste can be pretty good, and did I mention that C++ is a systems programming language? Today it is no longer a particularly good idea to write things like compilers or mail processing systems or web browsers or visual editors in systems programming languages. But for things which still IMO truly should be written in systems programming languages, so that IMO the serious competition is limited to Ada and C and some languages that I'm only vaguely aware of --- BLISS? Modula? Oberon? --- C++ starts to look pretty good.) So I recommend (6) any of several good books on CL: Seibel's Practical Common Lisp, or Graham's Common Lisp, or Norvig's Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp, and (7) Stroustrup's The C++ Programming Language. Note, too, that the Norvig book and the Stroustrup book are not only good books on their chosen languages, but also pretty good books on various aspects of programming in general.

To my taste, grokcode's Unix Power Tools doesn't really belong on the list: certainly the Unix tools are very useful, but learning them from the man pages seems to work adequately well. A book on Unix system administration might be worthy of being on the list, instead ... but you only get nine books. So you decide: if you think you're going to spend more of your time fooling around with system administration than you will spend fooling around with embedded microcontrollers and high-performance graphics and FFTs and BDDs and whatnot, then replace Stroustrup's book with a book like Frisch's Essential System Administration.

You also probably want a book on your preferred default way to provide GUIs and/or web interfaces to your software. Just as I didn't have a definitive choice for languages, I don't have a definitive choice here: it seems to depend strongly on your tastes, on how interactive your software tends to be, on how stably portable your software must be, and on what environment you typically run under. If your tradeoffs look like mine do currently, then (8a) JavaScript: The Definitive Guide might be a reasonable choice. But, e.g., someone with higher-bandwidth interaction requirements might laugh condescendingly and replace it with (8b) a book on something like OpenGL. Or, if your code always resides so deep in the system that you laugh condescendingly at the lightweights whose software provides any GUI at all, then replace this with (8c) a good manual on the most important API/architecture that your code talks to: something like the Stevens books Advanced Programming in the UNIX Environment or Unix Network Programming, or the reference manual for the CPU that your compiler produces code for, or maybe a book on relational databases or some such thing.

Finally, I'd nominate (9) a book on something reasonably fundamental and promisingly important for your work. For me currently this would most likely be a book related to machine learning: e.g., Mitchell's Machine Learning, or Norvig and Russell's aforementioned AIMA, or Gruenwald's The Minimum Description Length Principle. For someone else, it might be a book on modern techniques for expressing algorithms and data structures so that they scale to parallel hardware, or a book on ways to prove correctness of systems (like the aforementioned Model Checking, or like Bertot and Casteran's Interactive Theorem Proving and Program Development), or a book on virtual worlds, or a book on online reputation systems, or whatever.

And even more finally, vaguely in the spirit of the "and one half" in the title of the grokcode article, I'll nominate not-quite-a-book: choose (9.5) some significant open-source software project that you're impressed with, and study it enough that you understand the design choices and tradeoffs. Or if you prefer an actual whimsical book recommendation parallel to grokcode's recommendation of Hitchhiker's Guide to the Galaxy, then while I certainly enjoyed Hitchhiker's Guide, I think its connection to programming is weak enough that I'd prefer to nominate either the first three of Rick Cook's "Wizardry" books, or Vernor Vinge's The Peace War.

(And what books have been physically between the bookends on my desk and referred to in the last month or two? Three of the books I listed above: Introduction to Algorithms, Purely Functional Data Structures, and Javascript: The Definitive Guide. Besides those, lab-book-style handwritten logbooks for various of my computers, and ancient editions of Lamport's LaTeX: A Document Preparation System, Wall et al.'s Programming Perl, and The American Heritage Dictionary. A CL book would be there too, except that these days I'm sufficiently familiar with the language that when I need to look something up I can find it in the ANSI standard, and I use the HTML version of that.)

edit: added one leftangle-P-rightangle HTML tag before each paragraph after being burned for the Nth time by Blogger's inspired combination of not-quite-HTML formatting rules and not-quite-preview-function