LispCast

Knockbox, an Eventual Consistency Toolkit

July 24, 2012

I originally dismissed Knockbox as yet another concurrency paradigm for Clojure. But after watching this lecture by its creator, I have changed my mind. The few use cases he gives are enough to convince me that I may someday use it, if only to give a bit of structure to edit conflict resolution.

It appears to follow the Clojure library trend and implements work from a published academic paper.

I will keep it in mind when I want lock-free eventual consistency.

Building User Interfaces with Seesaw

July 24, 2012

Seesaw represents the best of Clojure: a monstrous Java library tamed to be pleasant and docile.

2012 State of Clojure survey

July 24, 2012

Chas Emerick is at it again. Please fill in the survey if you have ever been exposed to Clojure. This is the third year that cemerick has conducted the survey. He does a great job at summarizing the data with nice graphs. Have a look at the 2011 and 2010 survey results. The survey is useful to the entire community. Please participate.

Unbreakable Crypto

July 19, 2012

Good to see someone innovating on user authentication. Username + password is old as dirt.

Your Coding Philosophies are Irrelevant

June 15, 2012

James Hague:

Now imagine there are two finished apps that solve roughly identical problems. One is enjoyable to use and popular and making a lot of money. The other just doesn't feel right in a difficult to define way. One of these apps follows all of your development ideals, but which app is it? What if the successful product is riddled with singletons, doesn't check result codes after allocating memory (but the sizes of these allocations are such that failures only occur in pathological cases), and the authors don't know about test-driven development?

Imagine two pieces of medical equipment. One is easy to use, easy to clean, and always reliable. The other is buggy, has lots of crevices and corner which seem to collect infected goo, and needs constant maintenance. One of these apps was created by a rigorous design process, extensive testing, and built in an immaculate, air-conditioned factory manned by attractive MD/PhDs. The other was created in a basement from spare parts by a high school dropout with bad hygiene. What if the beautiful machine was created by the dropout?

Is it even possible? Is it possible that software can be successful over time if it is not well-built? In my experience, messy software eventually drags new development (including bug fixes) to a halt. Time needs to be invested in cleanup.

What I hear Hague saying is that some software architecture choices do not matter. This is true. But some do matter. They are not just important to you as a programmer. They are directly related to the number of bugs, the time it takes to fix a bug, and the time it takes to make a new feature. Don't throw the baby out with the bathwater.

How to answer a question: a simple system

June 15, 2012

My Master's thesis¹ was very related to this topic, so I thought I would share a little anecdote.

Michael Nielsen:

As I describe in detail below, their approach was to take the question asked, to rewrite it in the form of a search engine query, or perhaps several queries, and then extract the answer by analysing the Google results for those queries.

While I was researching my thesis, I came across a paper by Sergey Brin (cofounder of Google) which described a system called DIPRE which tried to do a similar thing using the Google index, though before Google was Google. My favorite quote from the paper was "the Google search engine and other research projects". Brin has been aware of the power of huge sets of redundant documents for a long time.

The system Nielsen describes is interesting and worth a look. I will just nitpick a little.

Why do so much query rewriting? Google does a lot of query augmentation itself now. Also, a high number of documents means it is very likely to find the question exactly as it was posed. Try asking Google a question and see if you can find the answer on the results page.
The system does not actually get rid of domain knowledge: it replaces part of the algorithm with a Google search, but there is a lot of domain knowledge of the English language used to extract the data from the text. An implementation of the system would use a simple statistical model of answers to find the text to extract.
The system of weighting queries is very hard to justify mathematically. Much better would be a probabilistic system, such as Naive Bayes. Naive Bayes is very simple and, modulo a naive assumption, is mathematically correct.
Be careful with questions like "Who shot JFK?" It is difficult for humans to answer, let alone computers.

The introduction to my thesis is a pretty good introduction to the techniques of information extraction in general.↩

Idiomatic way to represent sum type (Either a b) in Clojure

June 13, 2012

how do you represent sum types, also known as tagged unions and variant records? Something like Either a b in Haskell or Either[+A, +B] in Scala.

Either has two uses: to return a value of one of two types or to return two values of the same type that should have different semantics based on the tag.

The first use is only important when using a static type system. Either is basically the minimum solution possible given the constraints of the Haskell type system. With a dynamic type system, you can return values of any type you want. Either is not needed.

The second use is significant but can be accomplished quite simply in two (or more) ways:

{:tag :left :value 123} {:tag :right :value "hello"}
{:left 123} {:right "hello"}

What I'd like to ensure, is that :tag is always there, and it can take only one of the specified values, and corresponding value is consistently of the same type/behaviour and cannot be nil, and there is an easy way to see that I took care of all cases in the code.

If you would like to ensure this statically, Clojure is probably not your language. The reason is simple: expressions do not have types until runtime--until they return a value.

The reason a macro will not work is that at macro expansion time, you do not have runtime values--and hence runtime types. You have compile-time constructs like symbols, atoms, sexpressions, etc. You can eval them, but using eval is considered bad practice for a number of reasons.

Programming Languages and Piaget's Stages of Cognitive Development

June 13, 2012

I rewatched Doing with Images makes Symbols recently and there was one bit that I had not absorbed before.

In the talk, he explains the cryptic title. He took a simplified model of Piaget's Stages of Cognitive Development: corporal -> visual -> symbolic. He wanted to bring these three stages into relation with each other, so he made a sentence: "Doing with Images makes Symbols". Doing, of course, is the body stage. Images is the visual stage. And symbols is the abstract symbolic stage.

Doing

Imagine the Logo programming language. Logo is perfect for small children because it helps tie the visual and symbolic to their main mode of experience. Young children are primarily focused on their own bodily experience. They have trouble thinking from other points of view. But by translating their own actions (moving around) into symbolic instructions (Logo code), they can see the turtle performing the actions. Logo's turtle takes on the child's point of view and makes it visual. The child learns to see his/her own perspective from the outside. The coordinate system is egocentric. For example, the FORWARD command moves the turtle in the direction it is facing.

Smalltalk includes the same egocentric perspective. Each object has its own perspective of the other objects in the system. Smalltalk objects could ask "who are my neighbors in the list I am in". Squeak simulates a space that the objects inhabit which includes collision detection. This perspective, I would argue, is one of the unsung benefits of Object-Oriented Programming: the ability to program from the perspective of a single object at a time, freeing your mind from thinking about the inner workings of the rest of the objects. Objects are not just meant to be strung together but can be "embodied" by their programmer who sees from their unique perspective.

Symbols

The other end of the developmental spectrum is the symbolic. The obvious choice for symbolic programming is Lisp. Programs take on a very abstract quality and require advanced programming techniques. In this perspective, the programmer has a god-like, symbolic semantic of the workings of the system and builds abstractions in a calculus with simple yet powerful rules. Further, the programmer manipulates code in code. The coordinate system is abstract and relational. Shapes are manipulated in an abstract representation, not as pixels occupying space. A value may encode abstract knowledge (line between a and b) instead of specific coordinates.

Images

In the middle is the visual stage. I am going out on a limb, but I would like to posit that Java exemplifies this middle-ground perspective. In this perspective, you think and understand visually. You draw diagrams. You think in terms of archetypal interactions (client-server, MVC, etc). You can simultaneously conceptualize several roles in a single group of interacting objects. The coordinate system is cartesian, taking an external, objective perspective.

This is not to say that Java does not include any corporal perspective, nor that Lisp is all symbolic. I am talking in broad generalities about predominance.

Not one-at-a-time but all-at-once

The interesting thing about Piaget's stages are that they are stages of dominance, not a rocket-stage-jettison progression. That is, we can still, as adults, think in terms of any of the prior levels. In fact, the most successful scientists and mathematicians work on the earliest levels (corporal/visual). Einstein claimed he had body sensations about relativity and his disabilities with abstract symbols is well documented. These three types of thinking are done by different parts of the brain. The parts are always there but take dominant roles at different times in your life.

How this relates to programming language design is that when we are designing a programming system, we should include all of these ways of thinking so that the programmer can choose the best way to think about the problem. The easiest way to draw a circle is from a self-centered perspective in terms of motion and gross actions. The easiest way to plan a protocol is visually with boxes and lines or swim lane diagrams. Once translated into symbolic code, syntactic and semantic abstractions become evident, and can be manipulated as such.

The trick of good language design is to facilitate programming at any level and translation into the other levels. How can we perform this trick?

Why is a Monad Like a Writing Desk?

June 13, 2012

The best explanation of Monads for beginners I have seen yet.

Statistical Graphics, ClojureScript, &c.

June 13, 2012

Kevin Lynagh shows off some cool code for easily generating (and automatically updating!) DOM from Clojurescript. I can't wait to get to experiment with this.