February 27, 2004
I hear this question a lot, typically from kids who have just discovered that there's more to a computer than a web browser, and who are curious about where to go for here. The people who ask this question typically don't have any specific project in mind; if they did, it would be a lot easier to answer them. Instead, they're really saying "I don't know much about programming, but I think it might be kind of fun. Where's the best place to start?"
An experienced programmer on a dark day might answer the question "What programming language should I learn?" with "None. Learn to play a musical instrument instead." Today is not a dark day, however, and I'll do my best to answer it.
The person I'm addressing this article to isn't the person that has as their objective "I want to write a Windows application" or "I want to write a GUI for Bittorrent on Linux" or "I want to write a little tool that will run on my Mac to talk to iTunes." The target of this article is someone who has a general desire to become a software developer but doesn't yet understand how to get there.
So in that context, the high level, vaguely accurate answer to the question "What programming language should I learn?" is "It really doesn't matter." Partially this is because the question is ill-formed. The specific language one uses is mostly orthogonal to developing the skills one needs to be a good software developer. Let's look at what those skills are, first, and then later we can come back to the question and make some actual recommendations, rather than just rejecting the question.
A developer needs to be able to describe a problem to be solved. She needs to be able to break the problem down into smaller, easier problems. She needs to be able to describe a set of conditions that constitute solving the problem. She needs to be able to think of tests that determine whether a given program or part of a program are correct.
Those are the sorts of skills every good developer has regardless of whether they're writing code for themselves or for the marketplace. If a developer wants to work on a project with other people, be it open source or commercial, she needs additional skills. She needs to know how to find and read documentation. She will need to know how to use an Applications Programming Interface (API) that someone else has provided. She will need to be able to know when to use an already-existing API ("almost always") versus developing a new API ("almost never"). She will need the discipline to not be constantly reinventing the wheel. She will need to know how to write documentation. She will need to understand what makes code maintainable (correct and adequate documentation, proper use of namespaces, consistent formatting, useful comments), and to actually use that knowledge when she writes code.
All of those skills apply no matter what your weapon of choice is in terms of language.
There are three rough categories of languages of interest to the modern programmer: imperative languages (sometimes you'll hear these described as procedural) such as C, C++, Java, Pascal, and Modula-3. Imperative languages are about providing a sequence of commands for the computer to execute. In functional languages such as ML, Lisp, and Scheme programs aren't so much executed as they are mathematically evaluated , as with the lambda calculus. And scripting languages such as Perl, Python, and Tcl which allow for rapid prototyping of simple tasks. Note: yes, I understand that there's no academic reason to separate scripting languages from their imperative compiled brethren, but there are practical reasons which I'll discuss later.
When programming almost any (modern, useful) language, you're going to find yourself using a variety of directives. Some will be part of the core language specification: in C, integer arithmetic and assignment will be the same on every platform. Others will be part of a library that is likely to exist on any platform your program runs on (eg, stdio in C). Lastly, there are APIs which are platform-specific; a C library routine to open a dialogue box on a Win32 OS would be an example of this.
Specific Cross-Language Skills
Learning to program skillfully is something that comes only with great time and effort. There are plenty of programs that will compile just fine and run correctly that can still be called "bad programs," in the sense that when you look at the source, it is clear that the author doesn't have a grasp on some fundamental concept. The three little pigs built houses of straw, sticks, and brick. All of them served just fine as shelter, but only the brick house was strong enough to withstand the wolf. Your goal should be to not just learn how to make your programs run, but how to be confident in their correctness, robustness, and performance, before you've written a single line of code. To achieve that state, which may seem like a paradox, you need to understand the concepts underlying the craft of programming.
A shorter way of saying this is: don't worry about learning the syntax of a language. Don't concentrate on it. Don't spend time worrying about it. Learning where the semicolons or parentheses go will come by itself, as you write code and go through compile-run-test cycles. Look that stuff up when you need to, but understand that "learning what a constructor in Java looks like" is, in the long term not a valuable thing to concentrate on. Learning what a constructor is and what it does is a valuable thing to concentrate on.
So that's what you shouldn't focus on: syntax. What should you focus on? Here, in a vague sort of didactic order, are concepts that I expect a skilled programmer to understand regardless of the language they are using -- even if the language they are using at the moment doesn't actually support the item in question.
- What a variable is.
- How variables are typed. Why type is important.
- Scope (Lexical, Dynamic).
- Basic data structures.
- Basic control structures. Conditionals.
- Dynamic storage allocation. Garbage collection. How to manage memory if your language doesn't have GC.
- Linked lists. Hash tables. Btrees.
- Iteration. The off-by-one problem. How to avoid it.
- Recursion. When to use it. ("almost never").
- Basic algorithms -- sorting, searching, etc.
- Categorizing the run time of a piece of code (O(n), O(n^2), etc).
- Debugging techniques, from diagnostic prints to using a debugger.
- Assertions and how to use them properly.
- Basic object oriented programming concepts (inheritance, encapsulation).
- Threads. Typical concurrent programming mistakes, and how to avoid them. The producer/consumer problem.
- The difference between locking a mutex and waiting on a condition variable.
- Synchronous vs. asynchronous operations. Callbacks.
- Event-driven models.
- Exceptions. Strategies for handling errors generally.
- Advanced testing strategies. Fault injection.
Any one programming language you choose, unless you're really going out of your way to be obscure (e.g. Prolog), should get you through at least half of that list. Then you can decide whether to stick with the language you started with for the rest of the list, or start on a new one to fill in the gaps.
Programming is fundamentally a "learn by doing" activity, so your language choices are somewhat constrained by the need to use a language that can actually run on your operating system of choice. This still leaves you with a fairly wide set of options.
Language availability isn't the only practical consideration. If applicable, is there a debugger for the language you want to use? What set of libraries will you be using? How much documentation is available for the language you want to learn? Is there an active community you can turn to for help?
Enough Already. Just Tell Me What to Learn!
OK. I've tried to make the point here that whatever languages you decide to learn, you should be able to develop your skills over a period of years such that when you decide to learn a new language, it will just be a trivial matter of absorbing the new language's syntax. However, in the interests perhaps of sparking debate, I'll give my own personal opinion on teaching languages. Nothing in this section should be construed to mean that I'm saying that languages other than the ones I'm recommending aren't any good (except Modula-3. Modula-3 isn't any good. I'm saying that.)
First, learn a compiled imperative language. I very much like Java as a teaching language. I have a few reasons for liking Java. In addition to being somewhat cross-platform (cue mocking laughter), it is actually a fairly elegant language with a robust, extensive, powerful and most importantly for the novice, well-documented set of APIs. One of the things I like about Java as a teaching language is that it's always very clear, because of the namespace design, when you're using a "built in" command versus when you are calling some library API; I've seen novices using C and C++ get confused when the distinction wasn't as clear to them. There are many resources to help the novice Java programmer get off the ground. A student learning Java can learn about Object Oriented programming concepts, threads, events, dynamic allocation and garbage collection, advanced data structures, and most of the items on my list above. The fact that Java runs in a virtual machine is both a benefit and a drawback -- it will probably slow your development of understanding how these high level data structures map to the architecture of the machine you're on, but you can always go back and learn C later. Also, the syntax of Java is simple enough that it won't pollute you or ruin you for other languages (the way, say, Objective C would).
And in the interests of disclosing any bias on my part: I don't program in Java on a day to day basis. My day job is all C, all the time.
Next, learn a functional language. Lately I've been toying around with OCaml and ML, and I like them, but really its hard to go wrong here. Lisp, Scheme, ML -- these are all fine choices. I haven't examined it yet myself, but I've been told that Microsoft Research has a neat language called F# which is basically a version of OCaml that can call .NET library functions. That's pretty tempting, because it has such an aura of the forbidden about it -- take a pristine, educational, not-very-useful-in-practice language and turn it into something that can be used to develop Windows applications. Mmmmmmmmmmmmm, forbidden transgressive language.
Uh, what? What was I saying? Um, yeah, F#. Very bad. Don't use that! It's morally wrong. Learn Standard ML. Yeah.
Lastly, learn an interpreted scripting language of some sort. I like Perl, but Python has a big following, too. People will tell you that scripting languages are just as powerful as compiled languages. They're right. But it's been my personal experience that because of the environment scripting languages grow up in ("Oh my gosh, I have to change all occurrences of this string in every file on the server within 10 minutes, or I'll be fired") the idioms in common usage aren't as carefully thought out. Shortcuts are taken. Error cases are punted. Sloppiness is rampant. I agree fully that this is more of a cultural matter than anything intrinsic to the design of a language, but I can't ignore the reality, and that's how I see it.
If you're an experienced developer and you'd like to chime on this topic, please feel free to comment below. The only thing I ask is that you keep in mind that the topic is languages for learning the craft of programming, not languages for best accomplishing a specific task. Thanks.
- The article that inspired me to write this when I came across it: Peter Norvig's Teach Yourself Programming in Ten Years.
- I recently wrote an article on why we use pointers. You might like it.
- My favorite Java book is Peter van der Linden's Just Java.
- Microsoft Research's F# project.
- Wikipedia entry for the lambda calculus, the formal system behind functional languages.
- In the interests of balance, the author of CVSup explains why Modula-3 is great. (But he's wrong).
- The Skeptic's Guide to Objective C.
Please help support Tea Leaves by visiting our sponsors.