Keyword: data-driven

Data > Functions > Macros. But why?

December 23, 2014

Summary: "Prefer data over functions" is a common adage in Clojure circles. It is poorly debated because it is a terse statement in generalities. A valuable perspective is that data is transparent at runtime, while functions are not. This perspective gives a firm ground for discussion and design.

There's a design rule of thumb in Clojure that says that we should prefer functions to macros and to prefer data to functions. People talk about why, people react saying that functions are data, etc. It's all true, but it's all missing the point. It doesn't get at the fundamental, structural difference. And I think the discussion breaks down because people speak in generalities and not much is made measurable and concrete. But the pregression from macros to functions to data is, in my opinion, increasing in one important aspect, and that's the availability at runtime. Discussions should hinge on whether availability at runtime is desirable, which of course needs to be determined on a case-by-case basis.

Macros

Macros in Clojure are simply not available at runtime. By definition, they are expanded at compile time. It would be hard to make sense to be passing around macros as runtime values. Try to pass a macro as an argument to a function. You can't.

Further, once a macro has been expanded, it is no longer clear that it's expanded code even came from that macro. Macros are opaque at runtime.

Functions

Functions are first-class values. They can be passed around to functions, stored in maps, etc. They are totally available at runtime to be called.

But, calling them is about all you can do with them (besides building new functions with them, like with composition). You can't even easily inspect the code that's inside them, nor get at the environment they have closed over. And that's kind of the point. The function is a useful unit of computation. It's not a unit of semantics.

A function, too, is opaque in its way. A function, at runtime, is a black box. What does it do? You can't tell. You can't even tell how the function got there. Was it a fn defined in code? Or was it the result of function composition using comp? That information is not available at runtime.

Data

In Clojure, by data, people are really talking about the immutable data structures available in Clojure. Just to be concrete, let's narrow the definition of Clojure data down to edn.

Edn data is available at runtime. It's first-class in the way that functions are and macros are not. But it is also transparent. The structure of the data is completely available at runtime, unlike the structure of the function. This is why data is preferable.

Because data is available at runtime, you can do many things with it. Does the data describe a computation? Well, write an interpreter. Interpreters are much easier to write than compilers (macros) because the interpreter runs at runtime in the dynamic environment of the program. Compilers (and macros) separate out the two phases of compile-time and runtime. You have to keep track of the difference in your code. And Lisp is totally well-suited for writing interpreters. The first Lisp was defined in itself, for crying out loud.

And since it's just edn, it can also be manipulated using all of the tools available in the language. Maps can be assoced to. Sequences can be iterated. Transfering it over the wire is easy. Storing it to disk in a way that can be read in is easy. Try pretty-printing a function. But pretty-printing a data structure? Easy.

And, finally, the other thing about data is that it can be interpreted in different ways by different interpreters. I'm not trying to say that you might implement two algorithms for doing the same thing. What I'm getting at is that you can compute from it, or analyze it, or algebraically transform it, etc. It has become a semantic system in its own right.

Discussion

Why not work with code, which is data in Clojure?

You can argue that since code is data, why should I use data structures likes maps and vectors instead of actual code, represented as lists? This is actually a very valid point, and I think this is one of the better arguments. Lisp was defined in terms of an interpreter for data structures which together provide a powerful programming model. It would be foolish to discard this power and define our own for no reason.

The best reason I can think of is that our data is usually a very restricted form of code, many times not Turing complete. Turing complete code is proven to be impossible to analyze. But our restricted data model is powerful in exactly the way we need it (for our specific problem), but not generally powerful (as in Turing complete). So we can design it to be analyzable.

If we can restrict the power to be less-than-Turing-complete, we can analyze it at runtime. If analyzing it at runtime is desirable, then it is desirable to represent it as data.

The semantics of data is vague, while code is well-defined. Why should we use data instead of code?

Ok, this is also a good point. Clojure code, in theory, is well defined. At the very least, it is defined as whatever the compiler does. Most of the time, it is well-documented and well-understood. But your ad-hoc data structure, which represents some computation, has all sorts of assumptions baked in, like what keys are valid when, that are undocumented, have poor error messages, and maybe corner cases.

Wow, such a good point. When you're designing a DSL, this is always a challenge. But, just like restricting your power to below Turing complete can make your analysis way easier, keeping your semantic model simple and well-defined is the key to making it worthwhile. If the semantic model is simple, it could be beneficial to have it available at runtime. For instance, you could create a custom editor for it. If it's just functions, that's out.

Hasn't this discussion happened many times before? I mean, Ant started off ok, but then it was its own programming language and it sucked.

This is very true. People have talked before about the difference between internal and external DSLs, and how external DSLs eventually lose because they need all sorts of conditionals and loops, which the internal DSLs had by default. In my experience, this is true.

My personal guideline is that I only prefer data after I have bound the problem to a well-understood domain. That means that I have to write the program first in code, using functions, before I realize that, yes, this could be described very succinctly and declaratively as data. This took a long time and lots of mistakes to understand. I essentially will only refactor to a data-driven approach after I already have it written and working.

Conclusions

I've been coding in Lisp for a long time, so I've internalized this idea of data-driven programming. It's the main idea of Paradigms of Artificial Intelligence Programming, one of the best Lisp books out there, and a big influence on me. What the guideline of "Prefer data over functions" means to me is that when it's beneficial, one should choose data, even if functions are easy to write. Data is more flexible and more available at runtime. It's one of those all-things-being-equal situations. But all things are rarely equal. Data is often more verbose and error-prone than straight code.

But there is a sweet spot where data is vastly superior. In those cases, it makes your code more readable. It is more amenable to analysis. It can be passed over the wire. When I find one of those cases, you bet I'll prefer data over functions.

I think that doing data-driven programming is one of the things Clojure excels at, even more than other Lisps, because of its literal data structure syntax. Data-driven programming is one of the deep experiences that I wish everyone could have. And it's the primary goal of LispCast Introduction to Clojure, my 1.5 hour video course filled with visuals, animations, exercises, and screencasts. Check out the preview.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises

Learn more

You might also like

Pre-Conj Interview: Paul deGrandis

October 25, 2014

Introduction

Paul deGrandis gave a nice interview about his talk at Clojure/conj about data-driven systems. Read the background to his talk.

Interview

LispCast: How did you get into Clojure?

Paul deGrandis: I was a very early Clojure adopter. I came from Python (and before that a mix of C/C++/Java/Python), but I had always used Common Lisp to prototype ideas. Once I saw Clojure publicly announced, I gave it a shot and instantly identified with the Clojure Trinity: Simplicity, Power, and Focus. I began using it in all of my proof-of-concept work, and as I became confident in the language, integration with the platform, and my own skill set, I also started putting it into production.

LC: Can you define Data-Driven systems briefly?

PdG: Data-driven systems capture their interactions and operations as values - they are one possible conclusion of programming with values. We see this done in Datomic - queries are expressed as data. This concept is easily mapped to other systems - services, client-side applications, and more. Note that is more than just, "data all the things," this is, "data all the thing's things." Another example: Pedestal routes are represented as data ("data all the things"), but imagine if we captured the expression of an entire service as data. Imagine the properties that naturally fall out from that design decision. My talk walks through a real project that pushed this design decision as far as possible, and the steps that other developers can follow to do the same for their own systems.

LC: At Cognitect, they talk about Data > Functions > Macros. Is that what your talk is about?

PdG: My talk, at a very high level is about "Data > Functions > Macros," but it's mostly about modeling entire systems (not just interfaces) as data, why you'd want to, and why it's hard. It definitely challenges the previous notions (established by existing tooling) of what the conclusions of "programming with values" actually are (or could/should be). Tangentially, I'll talk about DSLs, but only in regards to constraining your domain as a means to increase expressiveness.

LC: I'm a big fan of working with data. But I've also seen its dark side. I've seen it pushed too far, or in the wrong area, to where it would have been better to use code. Have you noticed this? How do you avoid that?

PdG: I have indeed noticed it and it's a topic within my talk. Two crucial pieces of any great design (in software or otherwise) are the external constraints imposed by the problem, and the internal constraints imposed by the designer. Both are important, but it seems common that people overlook self-imposing constraints - and one needs them to achieve a really great design. I don't necessarily agree that capturing a system in data can be pushed too far, but I think it can be done without constraint or design intent - which produces a useless system. The crux is knowing that limitation and working within it. Outside of those limitations and constraints, one should fall back to foundation - regular Clojure code, etc.

LC: What is one thing you want someone to be able to do after they watch your talk?

PdG: I want someone to come away with a set of techniques to envision, constrain, and design data-driven systems. I want to create a spark that encourages Clojure developers to look beyond the initial merits of programming with values, and see further-reaching impact within their own systems - how to turn our ecosystem into a set of super powers.

I hope to provide a compelling example of how this approach plays out in production systems, and the general metrics and project impact it has.

LC: What resources would you recommend to a beginner so they could maximize their understanding of your talk?

PdG: My talk touches upon a lot of themes covered in some inspiring and informative talks: Tim Ewald's keynote from last year's Conj (Programming with Hand Tools), Rich Hickey's GOTO conference Keynote (The value of values), and previous Conj talks. I'm still searching for literature and books that I like on the topic, but I haven't hit any that I'd strongly recommend.

LC: Where can people follow your adventures online?

PdG:

My Twitter: @OhPauleez
My Github: https://github.com/ohpauleez
I sometimes blog: http://www.pauldee.org/blog/

LC: If Clojure rode an animal, what animal would it ride?

PdG: A very serious pony.

LC: Thanks for a great interview. It was informative.

PdG: Thanks a ton! I really appreciate all of this!

This post is one of a series called Pre-conj Prep, which originally was published by email. It's all about getting ready for the upcoming Clojure/conj, organized by Cognitect. Conferences are ongoing conversations and explorations. Speakers discuss trends, best practices, and the future by drawing on the rich context built up in past conferences and other media.

That rich context is what Pre-conj Prep is about. I want to enhance everyone's experience at the conj by surfacing that context. With just a little homework, we can be better prepared to understand and enjoy the talks and the hallway conversations, as well as the beautiful venue and city of Washington, DC.

Clojure/conj is a conference organized and hosted by Cognitect. This information is in no way official. It is not sponsored by nor affiliated with Clojure/conj or Cognitect. It is simply me curating and organizing public information about the conference.

You might also like

Pre-conj Prep: Paul deGrandis

October 12, 2014

Talk: Unlocking Data-Driven Systems

Paul deGrandis' talk at the conj is about the design tradeoffs when building your system in a Data-Driven way.

Background

The recommended priority for choosing an implementation strategy is Data > Function > Macro. Macros are made for humans, not machines, so they are the least composable. Functions are composable, but opaque. Their only meaning is their side-effects and their return value. Data, however, can be interpreted in different ways, depending on the context. If one can design it right, a data-driven system can be both easy for a person to read and write and reusable for many problems.

This talk is part of an ongoing discussion about data-driven systems in Clojure. Data-driven design has been around far longer than Clojure has. Paradigms of Artificial Intelligence Programming is the book on the subject, showing its use in Common Lisp. Christophe Grand gave a talk at the Clojure/conj in 2010 that explained in depth why macros should not be the first choice when building DSLs.

Many systems use data as their primary interface. Datalog queries in Datomic are just Clojure data. Prismatic Schemas are data. Leiningen project files are data (with a small macro for human convenience). Data-driven system is related to code-as-data, that all Lisps share. If code is data, why can't data be code? Clojure makes data-driven solutions easy with its variety of literal data structures.

Why it matters

The conversation about how best to design data-driven systems is not over. Active Clojure developers are still experimenting and testing the approach. This talk will show data-driven systems taken to an extreme, where an entire ClojureScript application is represented as data. Many people already agree with the approach and are eagerly awaiting the deeper analysis this talk promises.

About Paul deGrandis

Github - Twitter

This post is one of a series called Pre-conj Prep, which originally was published by email. It's all about getting ready for the upcoming Clojure/conj, organized by Cognitect. Conferences are ongoing conversations and explorations. Speakers discuss trends, best practices, and the future by drawing on the rich context built up in past conferences and other media.

That rich context is what Pre-conj Prep is about. I want to enhance everyone's experience at the conj by surfacing that context. With just a little homework, we can be better prepared to understand and enjoy the talks and the hallway conversations, as well as the beautiful venue and city of Washington, DC.

Clojure/conj is a conference organized and hosted by Cognitect. This information is in no way official. It is not sponsored by nor affiliated with Clojure/conj or Cognitect. It is simply me curating and organizing public information about the conference.

Data > Functions > Macros. But why?

Macros

Functions

Data

Discussion

Conclusions

You might also like

Pre-Conj Interview: Paul deGrandis

Introduction

Interview

Pre-conj Prep Email Signup

You might also like

Pre-conj Prep: Paul deGrandis

Talk: Unlocking Data-Driven Systems

Background

Why it matters

About Paul deGrandis

Pre-conj Prep Email Signup

You might also like