Expressive power is hard to pin down in programming languages. In this post, I want to explore how indirection mechanisms lend expressiveness to a language. By indirection, I mean something that happens at compile time or run time which was not explicitly specified in the code. By mechanism, I mean something with well-defined, rule-governed semantics.
Indirection mechanisms are typically what makes a program shorter, easier to write, and easier to reason about. Instead of writing machine code directly, you are relying on the compiler or runtime to do the heavy lifting.
I posit that the development of indirection mechanisms closely follows the development of multiplicative expressive power. One line of Python is worth far more than 100 lines of C and more than 1000 lines of Assembly. And rather than debate about expressive power in different languages, I'd like to simply catalog some of the more obvious and important indirection mechanisms out there.
While many languages might share a given indirection mechanism, I don't know all languages, so I am just going to go with my personal experience and group them roughly by programming languages I am familiar with. I would love to hear of more mechanisms, if you know of them.
Assembly
Depending on the features available, Assembly can be seen as simply a more convenient language to write in than machine language. But this convenience comes from a indirection.
Named instructions
The assembler looks up the name of an instruction in a table and converts it to the machine code. It is much more convenient to remember a mnemonic sequence of characters than an arbitrary number.
Labeled statements
Instead of calculating a jump offset from the current instruction, you just name a line of code and the assembler will calculate it for you. It is much less error prone and immune to changes in the program.
C
C has many indirection mechanisms. Most of them were not invented in C. However, from a historical and pedagogical perspective, C is a good contrast to Assembly. Both are relatively low-level and ubiquitous, yet C is a huge leap beyond Assembly in terms of indirection.
Automatic number coercion
Numbers in C are automatically coerced "up" when operated on, whereas in machine language, the values would have to be explicitly converted. You reduce mathematical errors because the machine can do a better job than you.
Function call semantics
Function calls compile to a lot of code, including copying the arguments to the stack. Function calling in C is more convenient.
Recursive expressions
In machine code and Assembly, instructions are written linearly in sequence. In C, however, expressions can be nested. The compiler will convert them to the appropriate sequence of instructions. Nested expressions are easier to understand.
Structured programming
Conditionals and loops provide a simpler, less error-prone way to code than machine code jumps.
Lisp
Lisp's main evaluation semantic is the source of much veneration and arguably is the Ur indirection mechanism. I am talking about the eval function as defined in the original Lisp paper by McCarthy. There are more mechanism, but they are too many to list here.
Eval
Most values (so-called atoms) evaluate to themselves, symbols are looked up in the current environment, and to evaluate a list (a function call), evaluate its elements, then apply the first element (hopefully it's a function!) to the rest of the elements.
The indirection comes from the late-binding of the symbol to its value. The environment can change, so the function call can have different meanings at different points in the program. This greatly facilitates incremental development.
CLOS
The Common Lisp Object System defines many indirection mechanisms for doing object-oriented programming. They include method combinations (before and after methods), class-based polymorphism, and method dispatch. The most interesting indirection mechanism is that a class can define its own indirection mechanism.
Garbage collection
Instead of writing routines for managing the memory of your program, the runtime will take care of recouping memory when it is no longer needed. A great productivity boost.
Macros
Lisp macro calls look like function calls, but they are expanded into code at compile time. This indirection mechanism essentially allows you to extend the language within the language.
Smalltalk
Smalltalk defined object-oriented programming as we know it today, based on a simple indirection mechanism: message passing. It also was the first language (that I know of) to make significant use of development tools. Like Lisp, I cannot list all of the indirection mechanisms in Smalltalk. They are too numerous.
Method passing
The main semantic of Smalltalk is message passing. A message consists of the name and the arguments. The name is looked up in the vtable of the receiving object. If it is found, the associated method is returned and called on the objects. Otherwise, the name is looked up in the vtable's parent, and its parent, to the end of the vtable list. This vtable lookup is both dynamic (it allows for changes to the vtable at runtime) and provides polymorphism (different kinds of objects can respond to the same message).
Semantic refactoring
Development tools before Smalltalk were usually confined to enhanced text editors. Smalltalk let you refactor code, meaning changing code not at the textual level but at the structural-semantic level. For instance, performing a "rename" operation on a method would not only change the name of the method in that class, but also change the code for all of the callers. This is another indirection mechanism which is now taken for granted in the Java and .NET worlds.
Self
Self is a prototype based object-oriented language.
Slot access
In Self, to access a slot from an object, the name of the slot is looked up in the object. If it is found, the value is returned. Otherwise, the parent slot is accessed, and the slot name is looked up in the parent recursively. This allows one to build objects that are mostly identical to an existing object except in a small way.
Haskell
Typeclass polymorphism
Haskell, unlike Smalltalk, keeps the values (objects) separate from the vtables. It is the job of the compiler to reunite these two elements when necessary. This allows data types to be defined separately (even in different files) from the code defining the type's inclusion in typeclasses. The Haskell compiler, following well-defined semantics, will convert calls to a typeclass method into a call to a function with the vtable as an implicit argument. This enables a limited form of polymorphism, where the same name refers to different functions depending on the types.
Pattern matching
Pattern matching in Haskell means destructuring a value into component values while optionally binding names to values. It lets you write in one concise statement the equivalent of several conditionals and let statements.
Clojure
Clojure also deserves a mention for its championing of various mechanisms for controlling concurrent state.
Concurrent state (refs, agents, atoms, etc.)
The Clojure runtime provides many indirection mechanisms to deal with state. Refs, for instance, provide transactional semantics over different values. This allows one to write parallel programs without explicitly using locks.
Indirection mechanisms are semantic abstractions afforded by the language. Like all abstractions, they are prone to leakage. How a language deals with the leaks might be just as important a choice as which mechanisms the language supports. For instance, what happens when you pass an object a message it doesn't understand? Some common options include: compiler error, thrown exception, silent failure, magic, or undefined behavior. These are damage control at best.
A lot of the leakiness of abstractions has to do with how mechanical the indirection is, and how simple the rules of the mechanism are. A complex mechanism is hard to understand, while a non-deterministic mechanism will sometimes give unwanted behavior. And when things go wrong, it is hard to distinguish between complexity and non-determinism. We should therefore strive for simple, deterministic abstractions. And we should also strive to bake the failure into the model, and build systems to deal with abstraction failures upfront.
I want to complete this list and keep it updated. Please if you have other ideas for indirection mechanisms and I'll add them here and credit them to you!
To conclude: Different languages make different choices of which mechanisms to provide, and hence the endless variety of languages we see today. It is my hope that we will find more such abstractions which allow programmers to be even more productive in the future. Further, perhaps there are ways to make existing mechanisms more robust and which fail in more appropriate ways.
I was brainstorming the other day about features that I expect to be easy in modern programming languages (plus libraries).
Some of them are very common, some less common, and some might not exist yet. Some are basic services of the runtime and some are more conveniences for interacting with the modern networked world.
None of them are impossible. I can point to examples of each of them in popular programming languages. But there is no language that has all of them. A lot of progress could be made with fortuitous curation of existing libraries.
String manipulation
String manipulation should be incredibly convenient. It's so common to parse something out of a string, or build up a new string. It should be so easy it disappears.
Regular expressions
Splitting
Joining
Replacing
Polymorphism
Polymorphism, as I'm describing it here, means the specific type of a value is an implementation detail that I can ignore. I can program to an "interface" and two values that implement the same interface can be treated interchangeably.
Basic "container" types
I don't think it's a stretch to say that we know enough, as a science and industry, to know exactly what we need from our containers. +1 for convenient syntax.
Linked list
Dynamic array
Hash map
Set
Input/output
It's still a little hard in some languages to read and write files. Especially binary.
Web requests
Our computers are approaching 100% time on the Internet. And the web is ubiquitous. You should be able to do web requests with one line.
URL Manipulation
URL's are very structured strings, but that doesn't mean we should be concatenating the strings ourselves. There needs to be a way to build URL's and break them into their individual components.
Garbage Collection
I just don't want to worry about memory management anymore.
Namespaces
It is convenient to avoid naming conflicts as your library ecosystem grows.
Homoiconic
Code can be manipulated as data.
Extensible syntax
I find I don't use macros much anymore. I do more with data-driven programming. But it is nice to have when you need it.
Math-oriented numeric types
There are times when I want numbers that are defined by their machine representations, such as C int and double. I'll call this "machine-oriented numbers". There are times when I want a more abstract representation of numbers, that act like they taught me in Math class. I'll call this "math-oriented numbers".
With math-oriented numbers, the fear of overflowing should not exist and division is division.
While we're at it, we could add in matrices, vectors, and even equations.
Units
Numeric values should be able to be qualified with a unit. You should be able to convert between units of the same kind.
Mass
Time
Distance
Temperature
and combinations of the above (speed, acceleration, etc.)
Time
Time, dates, and calendars are hard. The language should solve the hard part for us.
Error
A number is rarely 100% accurate. There should be ways to represent the error and have it accumulate when combining different values. +1 for tracing the sources of error.
Unification / pattern matching
It's just for convenience. The important thing is how unification errors are dealt with. Hopefully, there is something better than catching an exception.
Before / after functions
This is sometimes known as aspects. But there is a lot of value being able to add functionality to existing functions without modifying the original code.
Parser
Convenient way to define parser grammars that generate AST's available in the language.
CSV library
CSV is a very common way to store tabular data, but it's not really a standard. There is enough variability in the format that a library is called for.
Good error messages
Good error messages need to help you locate and solve the problem. It's actually really hard to get right. You can see this a lot in newer languages. Languages with helpful errors are typically mature.
Immutable values
Values are values. They shouldn't change.
Explicit model of time
It is very rare to find in programming languages. Time needs to be modeled by the programming language so that we can sanely deal with concurrency. This is one of the bigger mini-revolutions happening in programming at the moment.
Graphics
Most languages make it really hard to draw on the screen. This just needs to be easier. Displaying images, shapes, colors, text, video, and associated manipulations: scaling, rotation, translation.
Sound
Same goes for sound: playing sounds should be easy! Listening to the microphone should be easy!
Common file formats
Reading common formats should be easy. At the very least, there should be a framework/convention for libraries for common formats to follow.
Game input
By this, I mean getting raw events about key down, key up, mouse movement, etc.
GUI
First-class functions
Higher-order programming is an obvious win. +1 for lexical closures. +1 if you can store the functions to disk and read them back in!
Easy way to store data in flat text files
I don't want to have to write my own serialization routines.
This brief history of Haskell is a nice introduction to how Haskell has developed. Jones tells a nice story punctuated by seminal publications that contributed to the ideas. He is a living example of how well the Cambridge accent was designed for giving lectures.
Some programs work but are not well-typed. That is, they are the programs that you want to write but the compiler rejects them. So this is the "Region of Abysmal Pain". Because that's where you think: "Ugh! The type system is getting in my way! I know! I'll use an untyped language."
It is very refreshing to hear someone straight from the top of the static typing pantheon acknowledge the pain of working with current, statically-typed languages.
Now, of course there's always going to be a small Region of Abysmal Pain. There's rather a sort of sterile debate that happens between the static and the dynamic crowd. But I just want to remark that there's something not just self-indulgent but I think is actually quite important about the process of trying to find ways of making types more expressive so that they maintain the good property of being somewhat comprehensible and while moving closer to writing programs that you really want to write.
Alright, so here's our plan for world domination: make the type system sufficiently expressive to the point where everybody will forget dynamic languages and write statically typed ones. I don't expect that will happen to me in my lifetime. But I regard Haskell as a kind of thought experiment that's trying to go in that direction.
In the end, I think Simon Peyton Jones is the perfect mix of humility and hubris to lead the Haskell movement. He seems to have the patience to work with (and be excited about) an incomplete language (as Haskell was in the early days and still is in a lot of ways) without being dogmatic about the virtue of the current limitations.
This is the best video I've ever seen defining monadic parser combinators from the ground up. Nate Young explains his translation of Parsec into Clojure1. I'm surprised he could show the code and explain it well in less than forty-five minutes.
As you may know, I have my own PEG parser combinator library called squarepeg. I began developing it as monadic, but decided that I didn't like having to write my own version of Haskell's do notation (let->> in his version), which felt awkward in Clojure. Instead, squarepeg lets you bind variables and refer to them later within each parser itself (instead of in the code that builds the parser). There were a few other differences, but in general, I think the two libraries are more similar than different.
One thing I think I will borrow from his library is counting the character position of the current parser to help report parsing errors. It's going on my TODO list.
I have more plans for squarepeg when I get a chance. They include limiting the laziness of the parser and being able to report better error messages.
I tried to build a small project using ClojureScript and gave up, reverting back to Javascript. The workflow was awkward and information about the language was lacking. ClojureScript One aims to exemplify the power latent in ClojureScript by giving us a productive workflow from the very start. You clone the repo and use it as a starting point for your own project.
I haven't explored this new project, but it is already impressive. It sets a high bar for new projects. It includes a well-designed home page, getting started videos, and nice documentation. But, more importantly, it hints at a new age in web development in much the same way as the Blog in 15 Minutes Ruby on Rails video ushered in the rise of Ruby on the web. Watch the ClojureScript One Getting Started video to have your mind blown.
I also recommend a podcast about the goals of ClojureScript One.
If you are reading this, you are probably a programmer. Let me ask you, for just a moment, to step out of your programmer shoes and answer these questions:
Is this apple equal to that orange?
Is that cloud equal to this house?
Is my backpack equal to my television?
The obvious answer is "no" to all of these. And so (please step back into your programmer shoes) we see in many languages I can ask a similar question:
In Java:
apple.equals(orange)
Or in Clojure:
(= apple orange)
The answer to both questions is unsurprisingly false.
And in Haskell?
apple == orange
Type error! The compiler is saying I cannot even ask the question.
What model of reality is Haskell asserting?
Is it telling me that I need to convert the apple and orange into values of the same type?
Fruit {name = "apple"} == Fruit {name = "orange"}
Is this question really different from apple == orange?
Which model (Haskell's or Clojure's) more closely models reality?
Is this behavior intentionally designed into Haskell or is it incidental behavior due to the choice of type system?
Does this behavior make Haskell better or worse than a hypothetical Haskell that allowed the question to be asked?
In the functional programming world, there is insane amounts of code reuse happening, but along with this comes a fair bit of plumbing.
The plumbing Chiusano is referring to is incidental complexity imposed by current static type systems without a good set of "universal" types. Dynamically typed languages often do not have the same plumbing problem because of the extra information available at runtime to do dynamic dispatching and the use of overloading. This incidental complexity is a huge hurdle to languages like Haskell, where half of the code is about converting from one type to another.
This could be the fault of the codebase/libraries I am working with. In my practical Haskell experience, I have failed to find the zen of type/domain fit that static-typing enthusiasts talk about. Chiusano, however, hints that this problem is universal.
It makes it clear that types are effective not just at preventing many software errors, but also in guiding development.
I would love to see the adoption of standard type classes recognized by the Haskell compiler that could automatically handle the most common coercions safely. For instance, the compiler could "upgrade" function application to Functor application when the types match. It is beyond me whether this can be done "safely". I suspect that it cannot.
The second is to have more normalized types and type systems so that there is no incidental structure for code to broker between in the first place. To support this we'll see things like row types, unordered tuples, type sets, and so on, and any related type system machinery needed to make all this feasible and convenient.
If modules were programmed to a small, standard set of types (and type classes), they would be more universally applicable. The compiler could "overload" functions automatically. The trick is to keep it safe. At the very least, the "universal" types would help programmers do it manually. We may already be seeing such a convergence.
Not coincidentally, the built in Clojure data structures are usually all I need. Others appear to think the same. The Clojure ecosystem thrives from the consensus on a small, universal set. There is a lot of reuse and little boilerplate.
See Ring as an example of a universal datatype facilitating a flourishing community. It allows the problem domain (web programming) to be broken up into subproblems with little-to-no coordination between the groups. Clojure has a number of web frameworks and libraries that are all inter-compatible. I think this is unique among languages.
The editor will search for a program to make the types align, show me the program for confirmation if I request it, and then not show this subexpression in the main editor view . . .
Please give me this editor. It is technically feasible. Hoogle can already do a pretty decent search based on the types of functions. All you need is a planner to find a path from one type to another.
You could even help the search by defining a type class Convertible like this:
class Convertible a b where
convert :: a -> b
All of the instances of Convertible would form graphs which could be searched for all paths between any two types. The editor could show a list of those paths and let you choose one.
A great Alan Kay-esque speech (by none other than Alan Kay himself).
Alan Kay is a great thinker and I always enjoy his talks, but he needs someone to refine his presentations--particularly his verbal explanations. There are lots of great ideas and perspectives, but mostly he is just saying that "we're doing it wrong". Much better would be to show us his analysis of a problem.
In general, I finish his presentations feeling like everyone is doing everything wrong but with no concrete steps to take to correct it. We are thinking wrong (we need better perspectives), we are programming wrong, we are engineering wrong, we are learning the wrong things, we are doing computer science wrong, etc.
I also finish thinking that a new and better way is possible, but again, with no idea of that better way except a vague notion of less and better code in less time.
And then here I am, trapped in a world of crappy computers and crappy thinking, with a feeling of unease. I wish he would come down to Earth a little and apply that big brain with its great perspectives to a problem on-camera. It would be concrete and I could learn to emulate his thinking.