The Coming War on General Purpose Computation

December 28, 2011

An insightful talk from Internet activist Cory Doctorow about the future of the freedom of computation.

The Mapping Dilemma

December 17, 2011

David Nolen speaking at Strange Loop:

How much of Computer Science is sitting around when I'm sitting and working on a problem?

David draws on some of the giants of Computer Science in recent years. Alan Kay, Gregor Kiczales, Peter Norvig, and more. I love to see the products of great minds folded into Clojure and ready at my fingertips.

core.match and core.logic are now on my reading list.

multimethod.js

December 16, 2011

Kris Jordan:

Inspired by Clojure's multimethods, multimethod.js provides a functional alternative to classical, prototype based polymorphism.

multimethod.js builds functions which dispatch on arbitrary predicates in Javascript using a fluent style.

Multimethod iterates through each 'method' registered with when and performs an equality test on the dispatchValue and each method's match value.

I wonder whether a linear search for the dispatch value will scale. I prefer to see some kind of hash-map based solution. Is it possible to serialize the object to JSON and use the resulting String as a key?

Still, I enjoy seeing inter-language cross-pollination. It is also a nice translation from a functional language to a prototype language. The fluent interface does the job extremely well.

Kris Jordan deserves a follow on GitHub.

Ring 1.0.0 Released

December 13, 2011

Congratulations to the Ring developers on finishing the 1.0.0 Release.

James Reeves:

Ring 1.0.0 has now been released, almost three years since Mark pushed the first commit. In that time, Ring has become the de facto standard for building web applications in Clojure.

Ring deserves its place as the "de facto standard". It is the perfect example of what makes Clojure great. It encourages composition and simplicity and creates an abstraction that allows an entire ecosystem to flourish.

Read the Ring Spec and the Ring source. They both short and well-written.

Clojure for dummies: a kata

December 10, 2011

It's good to see someone trying Clojure and sharing his experiences. Congratulations are due for getting it up and running and finishing a kata.

Giorgio Sironi:

Clojure is a LISP dialect; if you have ever been forced to use LISP in a computer science class, I understand your resistance towards this class of totally functional languages.

I appreciate his honesty here. Universities have traditionally used Lisp to teach functional programming. They often neglect to mention that mosts Lisps are not purely functional. In fact, they lie to the students in the name of pedagogy. People leave Programming Languages class with many false notions about Lisp.

In fact, LISP for many of us means lots of Reverse Polish Notation where (+ 1 2) evaluates to three; absence of for cycles [loops] and other commodities to privilege tail recursion; and absolute lack of state in the form of variables (the state isn't really absent: I've seen it on the stack.)

In Reverse Polish notation, the same expression would be (2 1 +). Lisp uses Polish notation, otherwise known as Prefix notation.

Common Lisp, in fact, has more loops than Java. Java has for, while, and the sadder do..while loops. Common Lisp's loop macro alone trumps any other language I know for imperative and stateful iteration. In addition to loop, Common Lisp defines dotimes, dolist, do, do* (a variation on do), and the loops for symbols in a package do-symbols, do-external-symbols, and do-all-symbols.

The fact that a non-lisper believes something about Lisp so false yet so common hints that something is very wrong. The misconception "absolute lack of state in the form of variables" will be familiar to many Lisp programmers who talk to non-lispers.

Download and unzip the last release of Clojure. Start up a class by specifying clojure-1.3.0.jar in the classpath :

    java -cp clojure-1.3.0.jar clojure.main

It's nice that Clojure is so easy to run.

Again, thumbs up to Giorgio for finishing a kata in a new language. Keep up with the Clojure!

A Short Ballad Dedicated to the Growth of Programs

December 10, 2011

This is a cautionary tale set in a dystopian Lisp where nil is not false and false is not nil.

So I went back to the master and appealed once again
I said, pardon me, but now I’m really insane
He said, no you’re not really going out of your head
Instead of just VAL, you must use NOT NULL instead

nil means (traditionally) false, the empty list, and also "no value". In my opinion, Lisp has hit a sweet spot. Somehow, nil overloading makes for succinct programs.

It's possible to go overboard and do this wrong. Case in point: the problems with boolean values and comparison in Javascript.

The Heart of Unix

December 10, 2011

Despite all of its warts, I like working in Linux. I've used it for 15 years and I've never been as productive in another environment. Most people claim that it's the configurability of Linux that keep the users coming. That may have attracted me at first, but what attracts me now is its programmability.

Let me be very clear. I'm not saying that Linux is great because I can patch the source code to grep and recompile it. In all my years of Unix, I've never done anything like that. And I'm not saying that Linux is a great workstation for programmers because it helps you program better. Those are topics for another essay.

Unix is a programmable environment

I am saying that Unix is a programmable environment. When you interact with the shell, you are writing programs to be interpreted. You can easily extend the Unix system by writing a shell script, copying it to a directory in your PATH, and making it executable. Boom. You've got a new program.

What's more, that program, if it follows certain simple conventions, is now able to work with other programs. Those conventions are simple, and they are summed up well by Doug McIlroy, the inventor of Unix pipes:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

If your program reads text lines from standard in and writes text lines on standard out, it is likely to do well on Unix.

Programs on your path are like pure functions in the higher-level language called the shell

Not all programs are so pure. But the vast majority of the programs that are so typically Unixy are. grep, awk, sed, wc, pr, etc.

Unix is a multi-lingual environment

I must have compilers or interpreters for 30 languages on my machine. Maybe more. All of these languages are invited to the party. They can all call each other (through the shell). And of course their stdin/stdouts can be piped together.

You really can use the best tool for the job. I've got Bash scripts, awk scripts, Python scripts, some Perl scripts. What I program in at the moment depends on my mood and practical considerations. It is a little crazy that I don't have to think about what language something is written in when I'm at the terminal.

Unix provides a universal interface with a universal data structure

It needs to be stated that there is a reason all of these languages can work together. There is a standard data structure that programs are invited to use: text streams. That means sequences of characters. Text streams are cool because they're simple and flexible. You can impose a structure on top of the flat sequence. For instance, you can break it into a sequence of sequences of characters by splitting it on a certain character (like new-line). Then you can split those sequences into columns. In short, text is flexible.

Unix is homoiconic

There's another property that I think is rarely talked about in the context of Unix. In Lisp, we often are proud that code is data. You can manipulate code with the same functions that you manipulate other data structures. This meta-circularity gives you a lot of power.

But this is the same in Unix. Your programs are text files and so can be grep'd and wc'd and anything else if you want to. You can open up a pipe to Perl and feed it commands, if you like. And this feeds right back into Unix being programmable.

Functional + universal data structure + homoiconic = power

All of this adds up to synergy. When you write a program that follows the Unix conventions of stdin/stdout with text streams, it can work with thousands of programs that are already on your computer. What's more, your program has to do less work itself, because so much of the hard work can be done better by other programs.

On the file system, hierarchical names point to data objects

And this synergy extends well beyond just using text streams. I have this tendency to look to databases as storage solutions for my personal projects. They have some nice properties, like ACID and SQL.But by using a database, I'm missing out on joining the Unix ecosystem. If I use the file system to store my data--meaning text files in directories--I can use all of Unix to help me out. I can use find, grep, head, tail, etc., just because I chose to use the measly file system instead of some fancy database.

Blog example

A good example of the synergy I'm talking about is the blog you are reading now. Here's how my blog works:

I store everything on the file system. I have an src/ directory with drafts/ posts/ pages/ and links/. I wrote a Python script (currently at 183 well-commented lines) that reads src/ and spits out the final product to build/. The Python uses a few libraries, but the meat of it is done by calling other programs. The rendering of Markdown to HTML is done by pandoc, which happens to be written in Haskell. I also do a call out to the shell to copy a directory (cp -rp) because I was too lazy to figure out how to do it in pure Python.

I sync build/ to Amazon S3 with a Ruby program called s3sync. I edit my entries in Emacs. If I need to delete a post, I run rm. If I need to list my posts, I run ls. If I'd like to change the name of a post, I use mv.

It may not be the best interface for writing a blog. But notice all of the stuff I didn't have to write to get started. I'm already writing posts and publishing them. Compare that to the reams of PHP and Javascript it takes to get the same functionality in Wordpress. That's the power of small tools working together.

Unix is old

Now that I've expressed how great Unix is, allow me to speak about its numerous shortcomings. I can't say for sure, but I would guess that most of the shortcomings are due to the long history of Unix starting on underpowered machines.

For instance, the fact that your programs have to be manually stored to disk using file system operations so that your dynamic shell language can have access to them seems awfully quaint. But when Unix was developed, disk space, RAM, and computation were expensive. Everything was expensive. So the strategy was to cache your compiler output to disk so you wouldn't have to do a costly compile step each time you ran a program.

If I want to write a new program, even a short one, I have to open up a text file in Emacs (make sure it's in the path!), write the program, save it, switch to the terminal, and chmod +x it. Compare that to Clojure, where you constantly define and redefine functions at the REPL. Or, if you like, a Smalltalk system where you can open up the editing menu of anything you can see and change the code which will then be paged out to disk at a convenient time. Unix clearly has room to grow in that respect.

The file system

The file system is archaic, too. It's reliable, but a little feature-poor. It's one of the reasons I think first about a database before remembering the synergy available with the file system. It doesn't provide any kind of ACID properties. The metadata available is laughable (permissions, owner/group, date, and filesize?). A more modern file system would give a little more oomph to compete with other forms of storage.

The terminal

The terminal is just old. It's all text. The editing is sub-primitive. The help it gives you is the bare minimum. One of its biggest shortcomings is how opaque it is. It doesn't do much to help you learn commands. It's not very good with huge dumps on stdout. Multiline commands? Supported with \. I think we can do better.

Text streams

The world of computers has grown up a lot since the early days of Unix. There has been a Cambrian explosion in the number of file formats. Lots of them are binary formats. Lots are structured text, like XML or JSON. Unix can handle those kinds of files, but it has failed to find a lever to help the Unix user master them with the same synergy you see with flat text files.

Wrong turns

Unix has a long history. Some of that history was kind, some was unkind. Most of the development of Unix was just practical people doing their best with the tools they had.

What's unfortunate is that we now have better tools and we see what could be done, but to do it would break backwards compatibility. And so we continue with sub-optimal tools.

Layering instead of evolving

One thing I think is unfortunate in the world of Unix today is layering. Modern Linux distributions are midden piles of configuration daemons to manage permissions daemons to give your configuration GUI access to the configuration daemons. Or we find ourselves installing a database to manage a few kilobytes of metadata.

The problem is Unix has not evolved in those areas. The permission system has changed very little. Modern distributions want to provide a modern and unintrusive interface to protected resources, so they add a layer of indirection onto the primitive permissions model instead of evolving the permission system itself. The Unix permissions system is solid and has worked for years. Maybe it should stay. But instead of giving us small programs that do one thing well to let us become masters of the permissions system, we get obtuse, opaque daemons that also need to be learned.

The file system, though much improved in terms of capacity, stability, and reliability, still has the same basic features: hierarchical directories containing files, accessed by name. If you want something more, you have to add a layer like BerkeleyDB or SQLite. These tools are great, but I'd like to see a more Unixy solution that allows for the synergy you get from existing programs made to run with files on the disk.

Megacommands

Command bloat is terrible. Rob Pike and Brian Kernighan have written about this. I'll merely refer you to their excellent paper. The gist is that having n commands gives you O(n2) ways of combining them. Having fewer, bigger, "more powerful" programs does not give you this exponential and synergistic advantage.

If you look at it the right way, all of these little programs that do one thing are like functions in the higher-level language that is Unix. We see that languages like Perl and Python have huge numbers of libraries for doing all sorts of tasks. Those libraries are only accessible through the programming language they were developed for. This is a missed opportunity for the languages to interoperate synergistically with the rest of the Unix ecosystem.

The road ahead

I've given a bit of a taste of some of the non-Unixy directions we're going in. Now I'd like to end with some right directions.

I mentioned before that saving a compiled binary to disk is done to cache what used to be an expensive operation. With modern hardware, a short utility C program could be read in, parsed, compiled, and run very quickly. Probably with no noticeable delay. It's something to consider when thinking about the division between static programs and dynamic scripting languages and the role of the compiler.

Talking to Unix

Foreign Function Interfaces between programming languages are considered very difficult to work with because of the semantics mismatch between any two languages. But Unix provides a universal interface for programs to interoperate without the need for FFI. I hope to see more "sugar" in languages to take advantage of calling out to other programs for help. Perl's backticks comes to mind.

You might say that this is expensive. Well, yes and no. Yes, there is much more overhead in reading in who-knows-how-many files to execute some script on disk than in just calling some library function. I argue, though, that the time difference is becoming small enough not to matter; and the operating system should evolve to make it more practical.

Evolving stdin/stdout

Stdin/stdout with text streams is the closest thing we have to a universal, language-agnostic interface. It defines a minimal "constitution" with which programs can interact. Can this interface be improved on without destroying it? I wouldn't doubt it. There are lots of "data flow" patterns besides input and output. Pub/sub, broadcast, dispatch, etc., should be explored.

Text streams, evolved

Unix was designed for flat text and the existing Unix tools operate on text. We need new tools to bring structured text and binary into the Unix world to join the party. I don't think this would be hard. I've written programs that read in JSON and write it out with one JSON object per line. That lets you grep it to find the one you want, or wc -l it to count the objects.

Another thing I've been working on is defining a dispatch mechanism for common operations on files of different types. Take, for instance, metadata that is stored in a file. An HTML file has a title, sometimes it has an author (in a meta tag), etc. A JPEG file has metadata in the EXIF data. Is there some way we can unify the access of that metadata? I think there is and I'm working on it. The same command would dispatch differently based on mime-type.

21st Century Terminal

How can we improve the terminal? I think it's a hard problem but not impossible. Part of the issue with the terminal is that as X Windows developed, people started using menus to run monolithic programs instead of piping things with the shell. So the usefulness of the terminal will be improved, without changing the terminal itself, by breaking those monolithic programs up into composable programs. For instance, a program which displays all of the thumbnails of the files listed on stdin would be much more useful to me than a mouse-oriented file browser.

The terminal is about text. I don't think that could or should change. But does it have to be only about text? Explorations are underway.

The Shell

The last improvement I want to touch on is the shell language itself. Bash is ugly. There. I said it. A lot of good work has been done in programming language design and I'd like to see some of it make its way to the shell. What if we take the idea of Unix programs as pure functions over streams of data a little further? What about higher-order functions? Or function transformations? Combinators? What if we borrow from object-oriented languages? Can we have message passing? What about type-based dispatching?

Conclusions

Unix has always been practical and it has proven itself over the years. It's 40 years old and it's still being used. Furthermore, Unix is the closest thing to a personal computing experience1 that is practical today.

People tend to contrast Unix with systems like the Lisp Machine and Smalltalk. But I see more similarities than differences: Code as data. Everything is programmable. Dynamic language prompt. Universal data structure. A propensity for "dialects" or "distributions". Garbage collection.2 Unix just made a lot of compromises to make it practical on the limited hardware that was available.

Unfortunately, those compromises have stuck. A lot of work went into workarounds and a lot of software has been built on top of those design decisions. The question is: where to go from here?

My own personal choice is to go back to the roots. Often, when we want to make a change, we must look to what has worked in the past. What has brought us this far? What were the things that made Unix special? Unix was built by individuals all adding their own practical knowledge and hard work into one cohesive system. Their individual work was multiplied by the synergy of a common interface. If we want to evolve Unix (and I do), that common interface--the heart of Unix--is the place to start.


  1. When I say "personal computer", I'm referring to Alan Kay's vision:

    What then is a personal computer? One would hope that it would be both a medium for containing and expressing arbitrary symbolic notions, and also a collection of useful tools for manipulating these structures, with ways to add new tools to the repertoire.

  2. Unix has a limited form of garbage collection. Short-running programs (like those executed at the terminal) need not concern themselves with freeing allocated memory since the OS will free everything when they exit.

Tips for using marginalia

December 08, 2011

I've used Marginalia a tiny bit for a few of the libraries I've developed that actually have comments and doc strings.

Marginalia did help me refine my doc strings and comments. From the Marginalia docs:

Following the guidelines will work to make your code not only easier to follow – it will make it better. The very process of using Marginalia will help to crystalize your understanding of problem and its solution(s).

In general, this is true. Marginalia follows some strict rules, like a compiler. Even if your comments seem to make sense to you reading the source, Marginalia might produce something that makes no sense. However, it generates attractive and useful documentation. So I decided to use it.

What this meant was figuring out some of the quirks of Marginalia. Here is what I've learned:

Don't use ;-comments when a doc string is more appropriate

This is good advice in general and Marginalia makes the comments look wrong. In Marginalia, comments seem to follow the docstring for the function they apply to when both are defined.

Blank lines are important

From what I can tell, a blank line will cause the top-level comment to be placed above the following form. No blank line will put them at the same level. That sounds very sensible.

However, if you have a comment with no blank line immediately before a function, the docstring for that function will come first, followed by the comment. This is weird. You should follow the advice above and choose docstring or comments, but not both.

Namespace ordering

There is a convention in Clojure to use project.core as the main namespace of a library. Marginalia sorts the namespaces alphabetically. This means that in the documentation, project.anunimportantmodule comes before project.core.

I have renamed my namespaces once to avoid this problem. But in general, my projects are very small and don't have many namespaces. It's not a practical problem.

Marginalia is great. It produces beautiful documentation. And it has helped me clarify my code.

Nice job.

Janki Method

December 08, 2011

Jack Kinsella describes an excellent method for learning any skill, disguised as a method for learning programming.

He draws on his own personal experiences using Anki (a nice learning tool which I recommend, too; I've used it to learn foreign languages) to help him learn the huge number of commands, functions, and other arcana that we all either grep from memory or Google constantly as we work. He sums everything up in eight rules which define the Janki Method.

Here's an example:

The eighth rule of Janki encourages you to use your readings of other people’s code as a source of learning:

"Read code regularly. If you come across something interesting – be that an algorithm, a hack, or an architectural decision – create a card detailing the technique and showing the code."

In the article, you'll find solid examples of flashcards chunked down to the right bite-size.

The hardest part of the Method is to stick to it. He suggests rule 2:

The second rule of Janki encourages a commitment to daily learning:

"You must use Anki every single day- including weekends and holidays – and commit to doing so indefinitely."

I can't say this seems realistic, even with the five- to eight-minute daily commitment he claims. I have used Anki for long stretches. Two months was my longest. It is true that it doesn't take long each time. The trouble is the same trouble you have starting any new habit. It's just hard.

I have had luck using Anki in an opposite way: one hour per day for a week. For some reason, it's easier for me to commit to than an indefinite commitment. You don't get the same benefits. The benefits are different.

And, finally, the great thing about Anki is that it is very forgiving if you do stop using it regularly. If you take a two-week break, you can start up again with very little fuss.

Some ideas:

Jack manages to bring together the ideas of spaced repetition, learning by doing, and continuous self-improvement to give practical advice to make us better at what we do.

Nice job.

Waving Our Tentacles

December 08, 2011

Anthony Grimes shows off his Clojure interface to the Github API.

I'm definitely going to check this out. There are some people I'd like to stalk . . .