Object-Oriented Dispatch is the Dual of Functional Dispatch

March 11, 2015

Summary: Object-oriented dispatch is contrasted with functional dispatch, but they are shown to be two one-dimensional views of the same two-dimensional data. Clojure does not provide the two-dimensional representation, but does interesting things to transcend the one-dimensional views.

About a month ago, I wrote a post about how OO-style is the dual of functional-style. OO focuses on the data first, while functional focuses on the code. It is cool that the correspondence between the two is fairly clear and mechanical. It means that they're equivalent in a way. The distinction is mostly important when choosing how to represent a problem. A language should provide ways to express both (as Clojure does), and a way to translate between the two (which Clojure does not).

But it goes further than data vs code. In OO, the object is the unit of data (which manifests as the principle of encapsulation). But the object is also the unit of dispatch. Objects know their class, and classes know the implementations of their methods. Class first, behavior second. In a functional style of programming, the function is the unit of dispatch. A function knows how to react to all possible classes of arguments. Behavior first, class second.

Example:

(defn foo [x]
  (cond
    (string? x)
    ...
    (integer? x)
    ...
    (vector? x)
    ...

So again, we have a kind of duality. The classic way to explain this is to show it in a table.

method/class String Integer Vector ...
toString . . .
length . x .
+ x . x
* x . x
...

A dot represents an implementation, whereas an x represents an undefined operation.

In OO style, the class represents a single column. In functional style, the function represents a single row. Logically, however, the information is a two-dimensional table. That brings up a question: why don't languages store the information in this form internally? It is easy to project a column view or a row view from a table. And the x's in the table seem to be really useful for static checks.

Clojure does not represent its functions this way. But it does allow you to express your code in either a column view or a row view. The row view is the standard functional approach shown above. The column view is using deftype.

Example:

(deftype Person
  Object ;; indicating I'm overriding methods from Object
  (toString [p]
    ...)
  java.util.Comparable
  (compareTo [p]
    ...)
  ...)

Clojure goes a step further: instead of having to write out the entire column (all methods given a class) in one place, you can essentially index directly into the table given a method/class pair, and define the implementation directly. This only works with protocol methods (not methods on classes/interfaces due to limitations in the JVM).

Example:

(extend-type Object
  MyProtocol
  (mymethod [x o]
    ...))

OO-style and functional-style are duals in terms of dispatch. It's very related to the data-first or code-first duality I wrote about before. Clojure again straddles both sides of the duality and lets you write code in both styles, as well as transcend the column/row distinction entirely when using protocols.

How does your language deal with dispatch? Can you express the problem in the best way? If you're interested in this kind of topic, you would probably enjoy the Clojure Gazette. It's a weekly newsletter filled with content to inspire Clojure programmers. It's completely free and it's easy to unsubscribe.

Thanks to Marcus Blankenship for the inspiration for this article.

For more inspiration, history, interviews, and trends of interest to Clojure programmers, get the free Clojure Gazette.

Learn More

Clojure pulls in ideas from many different languages and paradigms, and also from the broader world, including music and philosophy. The Clojure Gazette shares that vision and weaves a rich tapestry of ideas from the daily flow of library releases to the deep historical roots of computer science.

You might also like

Object-Oriented Programming is the Dual of Functional Programming

February 08, 2015

Summary: Object-Oriented Programming is often shown in contrast to Functional Programming. But they are so exactly opposite that they are duals, and so equivalent in important ways. Which one to use should be left up to the programmer, as is done in Clojure and Javascript.

Let's take a quick look at programming from an interesting perspective. Let's say that programs are divided into code and data. 1 These are the two main axes: structuring code and structuring data.

Now, let's design a language. First decisions: which axis is primary, which is secondary?

Style Data Code
OOP Primary Secondary
FP Secondary Primary

In an OOP-style approach, we have a pointer to an object which is the data. And the data also tells you where the code is to access it (methods in the vtable). An FP-style approach will make functions (code) primary, with data referred to by the closures (the captured lexical environment).

I'm positive I'm not the first person to remark on this, but I think it's pretty cool that they are duals of each other. A simple transformation will switch one in the table with the other.

Perhaps the next obvious question is "which one is better?". Many arguments about OOP vs FP focus on this choice, about which axis should be primary. Because a compiler could easily transform between them, the decision is not quite so important as we might think at first. I think that a language should make both available and let the programmer choose. The programmer is then free to choose whichever best models the problem.

For instance, in Clojure, fns are closures, so they are really FP-style -- code meant to be executed which happens to refer to some data. But collections are data which refer to code which knows how to access the data. Both coexist quite cooperatively.

And Clojure gives you, the programmer, ways of expressing both styles. When you want to create some code to be executed, fn lets you define lexical closures. But sometimes you want to make a new data structure which acts like others. For instance, a new type of sequence. deftype lets you define data with interface and protocol implementations. I use both fn and deftype very often.

Interestingly, Javascript gives you these two options as well. function defines lexical closures, making code primary. But very often you define an object which contains (or refers to) data and secondary accessor functions. It's one of the nicer things about Javascript, included in Javascript, the Good Parts.

How does your language prioritize these two axes? Is there a way to choose which to use to best implement your design? If you're interested in this kind of topic, you would probably enjoy the Clojure Gazette. It's a weekly newsletter filled with content to inspire Clojure programmers. It's completely free and it's easy to unsubscribe.

For more inspiration, history, interviews, and trends of interest to Clojure programmers, get the free Clojure Gazette.

Learn More

Clojure pulls in ideas from many different languages and paradigms, and also from the broader world, including music and philosophy. The Clojure Gazette shares that vision and weaves a rich tapestry of ideas from the daily flow of library releases to the deep historical roots of computer science.

You might also like


  1. I don't want to go down the code-is-data rabbit hole. Let's stick to one floor in the tower of languages model.

The Paper Metaphor

February 27, 2015

Summary: Functional programs follow a simple rule: never write on the same paper twice. Imperative programs are free to define their own rules. Both have interesting consequences.

I just read Functional Programming for Everyone Else by Emily St. It's a good read about the fundamental differences between Turing Machines and Lambda Calculus. Near the end, she mentioned this:

I think of functional programs as ones which do their jobs without stopping to take notes along the way.

That got me thinking of the metaphor of taking notes. It's an interesting metaphor, because as she mentions in the article, modern computers are based more on Turing Machines, with their tapes (notes), rather than Lambda Calculus. Lambda Calculus being abstract instead of mechanical (what instead of how), it doesn't show its work.

In the spirit of more perspectives being better than fewer, I'd like to give my metaphor, inspired by idea above. Functional programs are those that have a very simple rule: never write on the same paper twice. In Turing Machine terms, it means never write on the same piece of tape twice. This rule is simple enough that you can build it into the compiler and runtime to enforce it. In programming language theory, we call this immutability. Once a paper is written on, it is immutable.

Imperative programs, on the other hand, don't have such a rule. They can write on any piece of paper, even write over or erase what's written. They rely on their own code to enforce rules for how to organize these pieces of paper so that it still makes sense. They might use locks, which are notoriously hard to get right.

The nice thing about functional programs is you get a nice record of what's happened. You can step through all of the intermediate work because it was never overwritten. It's like an artist keeping their old sketches that led up to their masterpiece. Imperative programs are like finding out that it was all done on the same canvas, art painted over art, irrevocably lost. There is no xray to uncover it.

The other nice thing about functional programs is multiple programs can share the same stack of paper. Since they're following a rule, they will never overwrite someone else's work. But imperative programs need to have complex coordination rules to make sure that they aren't writing on the same page at the same time, or overwriting something that someone else will need.

If you'd like to learn Functional Programming, I can recommend LispCast Introduction to Clojure. It will teach you Functional Programming using Clojure, animations, and screencasts, along with many exercises.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like

The Content of Your Code

October 06, 2014

Summary: Code style is important, but way less important than content. Yet everyone talks about style because it's easier. Let's talk about content. I'll start with some bullet points.

In fiction writing, there is a fine, but visible, line between style and content. Style is your choice of words and grammar while you're telling a story. Content is the parts of the story itself. It's the characters, the plot devices, the motivations, etc.

Style is important. Classic works of literature usually have a good style. But content is more important. Good style can enhance a story. But a story can be retold many times, each with a different style, because a good character and story can stand on its own. No amount of style is going to save crappy storytelling with uninteresting characters. It didn't work for The Matrix 2 & 3, and man, did not help at all in the Star Wars prequels.

You can roughly divide programming decisions along a similar line. Coding style versus coding content. So much advice falls on the style side. It talks about variable naming, function length, what parts of a language not to use.

Where is all the stuff about code content? How do we choose an algorithm? How do we pick a data structure? Basically, how do we translate a real-world problem into a computational model? How do we determine if a program correctly models the problem? How do we judge if one model is better than another? The answer is so simplistic. A model fits a problem if the structure of the model matches the structure of the problem.

Structure, structure, structure.

Here are some bullet points:

1. Choose the right collection

Here's a good example that we should all be familiar with. How do you choose between an array and a map? Well, if your problem is to do things in order, an array is the better choice because it is naturally ordered. If your problem is "I have an x and I need a y", a map is probably better, because maps associate one value with another. The data structure's properties mirror the properties of the problem.

2. Factor your code

Refactoring is improving the style of your code without changing the content. But factoring is changing the structure of your code to reveal the underlying structure of the problem. This is the only reliable way to get a one-to-one mapping between code and reality.

3. Determine the essential structure of the problem

I have written before about finding the essential idea in a problem. Object Oriented Programming advice tends to recommend picking each of the objects in the real world and creating a class for each. So, if you're modeling students and courses, regardless of the problem you're solving, you should have a student class and a course class.

This practice comes from the early days of OOP, when it was still used a lot in simulations. I can see the benefit of representing each thing in your simulation as an object. But we're not building a simulation. A university registration system is not a university simulator. We are not simulating students. We are not simulating courses. We need to be looking at the problem we're trying to solve.

How is it already done?

I really think the best way is to look at the process that is already being used. If you're hired to replace a manual, pen-and-paper system, you have a head start over a new system. Computerizing an existing process is easier because the problem is already well-understood. Go ask the registrar's office how they are doing it.

Let's say that each department keeps a large list of all the courses they give each semester. For each semester, they start a new notebook and make a page for each course. As students register, they write down their name. If they unregister, they cross them out. They leave room for enough students between each course, sometimes skipping a page. They put post-it notes sticking out the top so they can quickly turn to the page for a course when a student comes in the office.

Wow! Your job is now way easier. You just have to replicate that notebook in code. That is so much easier than modeling students and courses. And once it's done, you have a place for improvements that are only possible in a computer.

Finding the essence

But let's say it's a new university, trying to get a head start on old universities by organizing everything on a computer. So there's no existing process. You've got to make it up.

What do you do if the structure is not obvious? How do you determine the structure of a poem? Reading. Re-reading. Underlining. Arrows. Notes. Clarifying definitions. Basically, look for structure. Dig it all up. Then use your judgment about what is important. There often is not one single kind of structure, but a constellation of structure.

4. Factor out incidence from essence

The incidental structure just happens because of the choice of solution instead of the structure of the problem. The structure inherent in the problem is essential structure.

In the notebook used for registering students, there are some incidental implementation details that you don't want to replicate. In the notebook, they left some blank space for more students after each page so that they wouldn't run out of room. But running out of room is not an issue in a computer (no university is that big). So you can leave that part out.

But less obviously, the fact that there is a separate page for each course is also incidental. It's not important that the students in one class all be stored in a single place. What's important is that at any time, a student can register for the course (random access!) and that at any time, a teacher can list all students in a particular course (random access!). We need a way to project a list of students quickly enough in a course from however it is stored.

But it turns out that, if you factor correctly and find the essence, your solution should be generic. Why? Because structure is generic. It is pure content. The correct solution to this problem is to make a system to manage many-to-many relationships. Relational databases can do this easily with one table. You could make a ManyToMany<A, B> class. Those are implementation details that are incidental. What's essential is the many-to-many part.

Conclusion

We need more discussion about the content of programs. Style is important, but we need people to create acronyms and rules of thumb for choosing program constructs. The Design Patterns book (and movement) were important in this respect. It documented patterns of common structure. But it failed to do a good job teaching the how, and added an air of mystery.

Although this process is language-agnostic, Clojure is great for finding essential structure. One thing I like about Clojure is that the data structures are described in terms of their usage structure. And Rich Hickey has expressed many times that to understand a problem, to design a solution, we must pull things apart into its essential parts.

If you'd like to learn Clojure and see how it might help you think in terms of structure, I can recommend my LispCast Introduction to Clojure video course. It builds up skills from complete beginner to decomposing a problem into a generic solution.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like

What is Functional Programming?

July 01, 2014

Summary: I prefer to define Functional Programming as making a distinction between pure and impure code. With this definition, you can program functionally in any language. What differentiates the functional languages is how much help they give you to make the distinction.

There are a lot of conflicting definitions of Functional Programming out there. I'd like to share mine, which serves me well. It explains why Haskell is more functional than Scheme, and also how you can program functionally in a non-functional language like Java.

Functional programming means programming with a distinction between pure code and impure code. Pure code has no side effects. It's referentially transparent. It means the same thing every time you run it. Impure code contains side effects, so running it twice is different from running it once.

The distinction between pure code and impure code uniquely identifies functional programming and distinguishes it from other paradigms such as procedural and Object Oriented. Procedural is about modeling your solution as sequential steps. Object Oriented is about modeling your solution as communicating objects. Functional programming is about modeling your solution as pure functions.

Now, this definition is very practical. Notice that it's not about choice of language. You can write functional code in any language, just as you can code up an object system in C and say you're doing OO. The question is how much the language helps you write functional code or OO code.

On one extreme, you've got Haskell. There is no doubt that Haskell is a functional language. How does it help you write functional code? It has no mutable values and side-effects are confined to a single type: IO. The language forces you to make the distinction between pure and impure.

On the other extreme, you've got machine code or assembly. At the lowest level, the language pushes you to avoid the distinction. All operations are about changing at least one location in memory. It could be a register or the top of the stack or something. But you are forced to change something. However, with a lot of super-human discipline, you could keep the distinction in your head. You might create a little heap and keep the discipline "a procedure can only write to memory it allocates directly". And this way, you make a bit of room for some functional programming. But that language is not giving you any help.

So why functional programming? Well, it turns out that knowing that running code twice will produce the same result makes it very easy to reason about it. And reasoning about code is basically our job as software engineers. What's more, the kind of reasoning you can do with functional programs can reach all the way up to the highest forms of reasoning, like math. That's where Haskell really shines. All of the category theory stuff (monads, functors, applicatives, etc) is an expression of that--mathematical concepts that are applicable in Haskell code.

That's it. That's my definition. The definition is inclusive yet gets at the essence. Functional Programming is a perspective that makes code easier to understand and maintain as it's being used in a system complex beyond your possible comprehension. And at its most sublime and abstract levels, Functional Programming approaches mathematical reasoning.

If you'd like to get started with Functional Programming in Clojure, you can do worse than using the LispCast Introduction to Clojure video course.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like

Deconstructing Functional Programming

December 22, 2013

Because if you think about it, the stack itself is just an optimization. Right? There are these frames which contain information about each invocation. Each stack frame. Each activation record. And that's what they are--they're activation records. They're sort of objects. If you really have objects on the brain, like I do, then you realize that they're all just objects.

And they should be treated uniformly. You can even build a language that works this way--as I have. If that's the case, this is really a garbage collection problem. Right? Your stack traces might go away if you don't need them. You do need them when it's not a tail call because you need to go back there and use that information. But if it's a tail call, you don't need them, you don't need a tail call back to that frame. You need a pointer back to the last frame that wasn't a tail call. And they might get collected.

Which doesn't mean you actually have to implement it that way. Erlang sort of does. And there are many implementations that do that. They are not noted for their speed. If you can have real stacks, what happens when you run out of space? If you don't run out of space, it didn't really matter if you optimized the tail calls or not.

When you run out of space, you should GC the damned stack. You shouldn't just throw up your hands and say you're dead. And for all the normal programs that people write where it didn't matter, they won't care. And for your tail recursive programs, well, they might be a bit slower, but they will work. And then it's a matter of flags to the garbage collector if in production you don't want to debug it if you're sure it's all going to be fine--then go ahead and tell it to don't bother and just slam it and overwrite those frames directly.

Why is this so hard for implementers to do? Optimization is an optimization and should be optional.

It's a cool algorithm, but there's nothing nice that I'm prepared to say about Hindley-Milner.

He nails a lot of things I didn't like about Haskell.

All in all, this talk gets a lot of things right.

You might also like