3 Things Java Programmers Can Steal from Clojure

March 09, 2013

The other day I wrote about some principles that programming in Clojure makes very clear. Those principles could be applied just as well in Java, and often are. However, there are some things that make Clojure distinct.

Three of those distinctions are the way it deals with state change (using an STM), the Persistent Data Structures, and the literal syntax for data with a reader (now called edn). Diving into the source code for Clojure, I realized that these three bits were written in Java. And that means that they can be used from Java. It is certainly not as easy as using them in Clojure, but they are all three powerful enough to warrant using them if you are using Java. You simply need to add one more JAR to your project (or add a maven dependency). I have constructed a few minimal examples of their use.

1. Persistent Data Structures

Clojure comes with several powerful and fast collection classes. The interesting thing about them is that they are immutable. If you want to add an object to a list, you actually create a new list containing the old elements and the new element. Instead of using copy-on-write, it reuses most of the internal structure of the original list, so only a small number of objects need to be allocated. It turns out that this can be done very quickly, comparable to using an ArrayList.

The following example illustrates three of the more useful data structures: Vector, HashMap, and HashSet.

package persistent;

import clojure.lang.IPersistentMap;
import clojure.lang.IPersistentSet;
import clojure.lang.IPersistentVector;
import clojure.lang.PersistentHashMap;
import clojure.lang.PersistentHashSet;
import clojure.lang.PersistentVector;

public class PersistentTest {
  public static void main(String[] args) {
    IPersistentMap m = PersistentHashMap.create("abc", "xyz");
    m = m.assoc(1, 4); // add a new key/value pair
    m = m.assoc("key", "value");
    m = m.without("abc"); // remove key "abc" 
    System.out.println(m);
        
    IPersistentVector v = PersistentVector.create(1, 2, 3);
    v = v.assocN(0, "a string"); // change index 0
    v = v.cons("should be last"); // add a string at the end
    System.out.println(v);
            
    IPersistentSet s = PersistentHashSet.create("a", "b", "c");
    s = (IPersistentSet) s.cons("d"); // add d to the set
    s = (IPersistentSet) ((IPersistentMap) s).without("a"); // remove an element
    s.contains("g"); // should return false
    System.out.println(s);
  }
}

Now, it ain't pretty. But it's actually no worse than quite a few native Java libraries I've seen. There may be a better way to do this, but this one works.

2. Software Transactional Memory

Clojure uses Multiversion concurrency control to provide a safe way to manage concurrent access to state shared between threads. In Clojure, they are called refs. I won't go very deep into how it works. Suffice it to say that Clojure refs gives you non-blocking reads and transactional updates without having to do locking yourself. There are two caveats: 1 is that the value you give to the ref has to be immutable. 2 is that you should not perform IO (or perform any mutation) inside of the transaction.

package stm;

import java.util.concurrent.Callable;

import clojure.lang.LockingTransaction;
import clojure.lang.Ref;

public class STMTest {
  public static void main(String[] args) {
    // final needed to be used in anonymous class
    final Ref r = new Ref(1);
    final Ref s = new Ref(5);

    try {
      // run this in a transaction
      // don't do IO inside
      LockingTransaction.runInTransaction(
        new Callable<Object>() {
          public Object call(){
            s.set((Integer)r.deref() + 10);
            r.set(2);
            return null;
          }
        }
      );
    } catch (Exception e) {
      e.printStackTrace();
    }
        
    System.out.println(r.deref());
    System.out.println(s.deref());
  }
}

3. Extensible Data Notation

With Clojure 1.5, edn has become a standard part of the language. Edn is like an extensible JSON where the keys of objects can be any value (not just strings). It is based on the Clojure literal syntax, much in the same way that JSON is based on Javascript literal syntax. It is a nice way to serialize data. And since you already have the JAR in your project, it's a no brainer to use it.

package edn;

import java.io.PushbackReader;
import java.io.StringReader;

import clojure.lang.EdnReader;
import clojure.lang.PersistentHashMap;

public class EDNTest {
  public static void main(String[] args) {
    // reading from a string
    System.out.println(
      EdnReader.readString("{\"x\" 1 \"y\" 2}", PersistentHashMap.EMPTY));

    // reading from a Reader
    // really, you can use any Reader wrapped in a PushbackReader
    System.out.println(
      EdnReader.read(new PushbackReader(new StringReader("#{10 2 3}")),
                     PersistentHashMap.EMPTY));
  }
}
Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like

Atom code explanation

August 28, 2014

Summary: I go over a real-world example of how atoms and immutable values allow you to compose constructs in ways that are easy to reason about and less prone to error.

The other day I was in IRC #clojure and someone asked a good question. They had code like the following, and they couldn't understand why they couldn't modify a map.

(def state (atom {}))

(doseq [x [1 2 3]]
  (assoc @state :x x))

(println @state)

What does this print? Well, the asker wanted it to print {:x 3}. But it printed {}. To understand what's happening, let's go step by step.

{} creates an empty map. It's literal syntax for a constructor for a map. This one happens to be empty.

(atom {}) takes the empty map that was just created and passes it to the function atom, which constructs a new clojure.lang.Atom. Atoms are objects, and its current state is the empty map we just passed in.

(def state (atom {})) defines a new var called state in the current namespace.

At this point, we've got a variable called state whose value is an atom that holds an empty map.

(doseq [x [1 2 3]] loops over the numbers 1, 2, and 3. x will be bound to each of those numbers, in turn.

@state gets transformed into (deref state), which returns the current value of state. :x is a literal keyword, and x is a reference to the x bound inside the loop.

(assoc @state :x x) creates a new map by taking the current value of state (which happens to be {}) and associating :x with x (which will be 1, 2, and 3 as the loop happens). The value is returned by assoc, and then thrown away, since it isn't bound to anything.

Then (println @state) will print the current value of state, which still is {}.

This code shows a common problem that beginners face in Clojure: how do immutable data structures (like maps) and the concurrency primitives (like atom) work together to manage state?

The answer is quite simple (in the Rich Hickeyan sense) and elegant. By separating the ideas of value and state, Clojure has made it easy to express precisely the behavior you want in concurrent systems.

The value is the map. It is immutable. It cannot change. It is a single value, and it will always be the same. That means threads can share the value with no worries that one of them will change it.

The state is the atom. It's a mutable object. And being an object, it has methods that define its interface. In the code above, we saw that you can call deref on an atom to get its current value. deref is basically a getter.

The main way to change the value of an atom is using swap!. swap! takes an atom and a function (plus optional arguments) and calls the function on the current value of the atom. It then sets the value of the atom to the return value of the function. So let's use that to fix the code.


(def state (atom {}))

(doseq [x [1 2 3]]
  (swap! state assoc :x x))

(println @state)

swap! takes the atom (state) and a function (assoc) and some arguments (:x x). It calls assoc on the current value of state with those extra arguments and sets the value of the atom to the return value of the function.

The swap! expression is almost (but not) the same as this code:


(reset! state (assoc @state :x x)) ;; never do this

reset! changes the state of the atom but without regard to the current value. This new code is bad because it's not thread-safe. Use swap! if you need to use the current value to determine the new value.

So what does an atom do? What does it represent?

Atoms guarantee one very important thing: that each state is calculated from the last state. The swap! operation is atomic. No matter how many threads are trying to change the value, each change is calculated from the previous value and no previous values are lost. That's its contract as an object and it's one of the important ways that Clojure helps with concurrency.

How can a value be lost?

If we have two threads, each trying to change state in the same incorrect way (using reset!), the order of evaluation will have several steps:

  1. (deref state) ;; call this value *1
  2. (assoc *1 :x x) ;; call this value *2
  3. (reset! state *2)

Because the threads are running concurrently, the operations have a chance of interleaving their steps in unwanted ways. For instance, threads A and B might interleave like this:

  1. A: (deref state) ;; call this value *1A
  2. A: (assoc *1A :x x) ;; call this value *2A
  3. B: (deref state) ;; call this value *1B
  4. B: (assoc *1B :x x) ;; call this value *2B
  5. B: (reset! state *2B)
  6. A: (reset! state *1A)

What happened? On line 6, A set the value of state to the value it calculated on line 2. So B's work is completely discarded. That's probably not what was intended. What's worse is that that is one of many possible interleavings, some of which work and some don't. Welcome to concurrency!

What you probably wanted was to make sure that no work is discarded. You want the operation to be atomic. That's why it's called an atom. swap! is atomic. A swap! to an atom occurs "all at once", instead of on three lines like the reset! example. If two threads are doing swap!, there are two possible interleavings.

  1. A: (swap! state assoc :x x)
  2. B: (swap! state assoc :x x)

And

  1. B: (swap! state assoc :x x)
  2. A: (swap! state assoc :x x)

These are usually what you want. If only one or neither one works, atom is not the right construct for you.

So there you go. Atomic mutable state with immutable values gives you a nice, composable concurrency semantics. You could do it with locks but it's harder to ensure you're doing it correctly. It's slightly higher-level than locks yet it provides tremendous value. Atoms are easier to reason about and less prone to errors.

If you'd like to learn the basics of Clojure, I recommend my video course called LispCast Introduction to Clojure. I don't go over concurrency, but you will learn lots of functional programming. Go check out the description to see if it's right for you.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like

Clojure Gazette Looking Forward

January 19, 2015

Summary: I am looking for more sponsors for the Clojure Gazette and I need your help.

I have some big plans for the Clojure Gazette this year.

As far as I can tell, the Clojure Gazette is the only Clojure-focussed newsletter out there1. It's usually a collection of links to material I think would interest and educate people interested in Clojure. It's not always directly about Clojure. But Clojurists seem to be interested in a variety of things. Sometimes the issue is based around a strong theme, like Haskell or Object Oriented Programming. And sometimes it is an interview. This year, I'd like to do more themed issues are more interviews.

I think it plays an important part in the community. I have interviewed the Google Summer of Code participants for a couple of years now. The interviews let the community know what they were working on and what progress they had made. And some of the themed issues got the most positive commentary I've ever received. People seem to want to learn and to have existing information organized for them.

As you can imagine, the Clojure Gazette takes a lot of work to put out each week. There are over 3,200 subscribers now, and it's growing every day. In July 2014, I began inviting sponsors to purchase a placement in issues of the Gazette. I've had a few generous sponsors contact me, which got me a few months of sponsorship.

The sponsorships have been a success, in different ways. One job advertisement got hundreds of clicks and valuable exposure to a company that is just beginning to use Clojure. Others brought qualified visitors to developer-related services. I'm very greatful to Clojure/conj, which bought a placement. It is a big vote of confidence when the official Clojure conference asks to put their logo in your newsletter.

Now I am trying to be more active to find the sponsors. The goals for the Gazette are modest. None of them require any more money, just more time and work, and a little more organization. But the total time input is significant. With a small amount of funding, I'll be able to justify increasing the quality of the Gazette. I've always thought it would become a business one day. Putting money on the line always makes things more crisp and professional.

Now is where I ask for a favor. If you can spare a few moments, would you please think about whether you get value from the Gazette and what you can do to send some sponsors my way. It may be contacting the hiring department at your company and suggesting a paid job listing. Or it could be experimenting with selling your company's service to the thousands of smart Clojure programmers subscribed to the newsletter. Or maybe you just want to show your appreciation and get some good karma in the community. I appreciate all help. Get in touch by , twitter, or call me at 504-302-3742.

Here is a link to the current version of the Media Kit if you'd like more information.

Thanks

Eric <>

For more inspiration, history, interviews, and trends of interest to Clojure programmers, get the free Clojure Gazette.

Learn More

Clojure pulls in ideas from many different languages and paradigms, and also from the broader world, including music and philosophy. The Clojure Gazette shares that vision and weaves a rich tapestry of ideas from the daily flow of library releases to the deep historical roots of computer science.

You might also like


  1. There is (def newsletter) which has not had an issue since October 2013.

Clojure is Imperative

August 22, 2014

Summary: Clojure is an imperative language. Its operations are defined in terms of concrete actions. But those actions are often the same actions available to the programmer at runtime. This makes it easy to bootstrap.

Update: أخلاق الخيميائي pointed out that I was wrong about the size of GHC. Luckily it was not salient to my point so I just removed that part of the article.

Update: After talking with several people, I've decided that my writing was really unclear. I've done some major editing to make it as clear as I can. Thanks to everyone who commented and helped me clarify my thinking and writing.

I was recently on the Cognicast and I mentioned something really important to me, but I did not go that deep into it.

Clojure, and Lisps in general, are imperative languages. Yes, they are good for doing functional programming, but their main paradigm is executing lists of commands in order.

On the podcast I mentioned the first imperative example that came to mind, which was the do form, which executes each expression in the body and returns the value of the last expression. You would only want to execute an expression and throw away its value for its side effects.

But why is that important to me? It got me thinking about a deeper but related idea.

Clojure is a relatively transparent layer above the JVM. I say "relatively" because languages do get quite a bit more opaque1. But it manages to be powerful through well-chosen abstractions.

I should be a little more specific about what I mean by "transparent" and "opaque". This should be the most controversial part of this post, so I want to get this right. These are not formal definitions. Transparency/opaqueness measures abstractions. Opaque abstractions show less of the underlying machinery. Transparent abstractions show their machinery. This is a spectrum.2

Clojure's functions are rather opaque. Defining a function (with fn) in Clojure creates a class and instantiates it with the values from its lexical environment. This happens without having to think about classes. You're not thinking about the machinery. The machinery leaks out sometimes, like when you're looking at stack traces. But in general, an illusion is maintained.

But Clojure's def form is pretty transparent. You do have to think about what it's doing, about the current namespace, the order of the defs in a namespace, etc. There is not much of an illusion to maintain.

Haskell has a well-defined execution semantics. It's formally defined and you can step through the execution of a Haskell program by hand if you want. In that sense, it's imperative. But the execution order is obscured by the somewhat opaque abstraction of lazy evaluation. Clojure's execution order is more or less directly the execution order of the JVM it runs on--hence more transparent.

The reason this is important is that Clojure's strategy is to be transparent unless there is significant gain. This is part of what is meant by "embracing the host". Haskell's strategy is orthogonal to the transparency/opaqueness axis. Haskell aims to be formally well-defined. Formal semantics allows deep static analysis and program transformation.

Besides the strategy of being transparent, what I like even more about Clojure is that the many abstractions are defined in the same abstractions that you have available as a programmer.

This is from the docstring of def:

Creates and interns a global var with the name of symbol in the current namespace (*ns*) or locates such a var if it already exists. If init is supplied, it is evaluated, and the root binding of the var is set to the resulting value. If init is not supplied, the root binding of the var is unaffected.

Creating a var? I can do that. Interning it? I can do that, too. Setting the root binding? Easy! The core can be kept minimal because abstractions can build on each other. If you get the abstractions right, the amount of code you have to write in your implementation language is small.

And this gets to the heart of it: you can write a Lisp yourself. Many people have. You can write an easy Lisp compiler in a weekend and build features on top of it, almost never having to change the original compiler.

This is the magic of bootstrapped languages like Lisps. They have a small core that you need to get right, then everything else can be written in that core. It's the ultimate minimal virtual machine.

What's the relationship between bootstrapping and transparency? The more opaque the abstractions, the more the language must do to maintain the illusion. Lisps are easy to bootstrap because the abstractions chosen are either transparent and trivial to implement (like def or if) or opaque and powerful (like fn).

I like Lisps (and Clojure) because I feel that I can understand them and build them myself. I don't actually understand everything, but I could if I tried. Somewhere along the way I developed a deep interest in bootstrapping. Bootstrapping is compounded leverage. You build small abstractions on top of the previous ones, and use those to build yet grander ones.

If you like this attitude toward programming languages, you should learn a Lisp. I suggest Clojure, and I recommend the LispCast Introduction to Clojure video series. You'll learn about building up powerful abstractions, one layer at a time, in a small amount of code.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like


  1. There are more transparent languages as well, but they tend to be obscure.

  2. As an aside to those who read previous versions of this post, what I meant by imperative/declarative was transparent/opaque. I botched it and I'm trying to get this idea right.

Regexes in Clojure

June 03, 2014

Summary: With a few functions from the standard library, Clojure lets you do most of what you want with regular expressions with no muss.

Clojure is designed to be hosted. Instead of defining a standard Regular Expression semantics that works on all platforms, Clojure defers to the host's semantics. On the JVM, you're using Java regexes. In ClojureScript, it's Javascript regexes. That's the first thing to know.

Other than the semantics of the regexes themselves, the API is standardized across all platforms in the core library. And the syntax is convenient because you don't need to double escape your special characters.

Literal representation

Regexes can be constructed in Clojure using a literal syntax. Strings with a hash in front are interpreted as regexes.

#"regex"

On the JVM, the above line will create an instance of java.util.regex.Pattern. In ClojureScript, it will create a RegExp. Remember, the two regular expression languages are similar but different.

Matching (with groups)

There is a nice function that matches the whole string. It is called re-matches. The return is a little complex. If the whole string does not match, it returns nil, which is nice because nil is falsey.

=> (re-matches #"abc" "zzzabcxxx")
   nil

If the string does match, and there are no groups (parens) in the regex, then it returns the matched string.

=> (re-matches #"abc" "abc")
   "abc"

If it matches but there are groups, then it returns a vector. The first element in the vector is the entire match. The remaining elements are the group matches.

=> (re-matches #"abc(.*)" "abcxyz")
   ["abcxyz" "xyz"]

The three different return types can get tricky, but in general I do have groups, so it's either a vector or nil, which is easy to handle. You can even destructure it before you test it.

(let [[_ fn ln] (re-matches #"(\w+)\s(\w+)" full-name)]
  (if fn ;; successful match
    (println fn ln)
    (println "Unparsable name")))

Matching substrings

re-matches matches the whole string. But often, we want to find a match within a string. re-find returns the first match within the string. The return values are similar to re-matches.

No match returns nil

=> (re-find #"sss" "Loch Ness")
nil

Match without groups returns matched string

=> (re-find #"s+" "dress")
"ss"

Match with groups returns a vector

=> (re-find #"s+(.*)(s+)" "success")
   ["success" "ucces" "s"]

Finding all substrings that match

The last function from clojure.core I use a lot is re-seq, which returns a lazy seq of all of the matches, not just the first. The elements of the seq are whatever type re-find would have returned.

=> (re-seq #"s+" "mississippi")
   ("ss" "ss")

Replacing regex matches within a string

Well, matching strings is cool, but often you'd like to replace a substring that matches with some other string. clojure.string/replace will replace all substring matches with a new string. Let's take a look:

=> (clojure.string/replace "mississippi" #"i.." "obb")
   "mobbobbobbi"

This function is actually quite versatile. You can refer directly to the groups in the replacement string:

=> (clojure.string/replace "mississippi" #"(i)" "$1$1")
   "miissiissiippii"

You can also replace with the value of a function applied to the match:

=> (clojure.string/replace "mississippi" #"(.)i(.)"
     (fn [[_ b a]]
       (str (clojure.string/upper-case b)
            "--"
            (clojure.string/upper-case a))))
   "M--SS--SS--Ppi"

You can replace just the first occurence with clojure.string/replace-first.

Splitting a string on a regex

Let's say you want to split a string on some character pattern, like one or more whitespace. You can use clojure.string/split:

=> (clojure.string/split "This is a string    that I am splitting." #"\s+")
   ["This" "is" "a" "string" "that" "I" "am" "splitting."]

Nice!

Other functions

Those are all of the functions I use routinely. There are some more, which are useful when you need them.

re-pattern

Construct a regex from a String.

re-matcher

This one is not available in ClojureScript. On the JVM, it creates a java.util.regex.Matcher, which is used for iterating over subsequent matches. This is not so useful since re-seq exists.

If you find yourself with a Matcher, you can call re-find on it to get the next match (instead of the first). You can also call re-groups from the most recent match. Unless you need a Matcher for some Java API, just stick to re-seq.

Conclusion

Well, that's regexes as I use them. They're super useful and easy to use in Clojure once you get the hang of them.

If you're interested in learning the fundamentals of Clojure, may I suggest my own LispCast Introduction to Clojure video series. It guides you through a deep experience of the language. You'll learn REPL skills, how to set up a project, and how to develop a DSL, all in a fun, interactive way.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like

Clojure Test Directory

March 30, 2015

Summary: Where to put your tests is a common question. You could put them anywhere, but you want to pick a place that makes it easy to find, easy to exclude from production, and work well with your tools. My recommendation is to follow what most projects do, which takes care of all of these requirements.

You want to write some tests in Clojure. Maybe they're unit tests. Maybe they're integration tests. The first question you must answer is where do you put your tests?

And you don't want them just anywhere. You actually have some important requirements dealing with where they are:

Here's my recommendation, which is the de facto standard of organizing your tests. It works with Leiningen, CIDER, and vim-fireplace.

First, you make a new namespace structure in a test/ directory. It should mirror the src/ directory.

If you have:

src/
  lispcast/
    core.clj
    init.clj
    util.clj

Then your test directory should look like:

test/
  lispcast/
    core_test.clj
    init_test.clj
    util_test.clj

But also notice the second point: that the structure is the same, but the names are slightly different, but it a systematic way. It's really easy: you just add -test to the namespace name, which becomes _test in the file name.1 Then, you put all the tests that test lispcast.core into lispcast.core-test. Now they're easy to find!

If you need to write a test that crosses two different namespaces (like an integration test might), then you can just make a new test namespace that doesn't correspond to one or the other.

Leiningen will load the test/ directory selectively, depending on if you're deploying to production (it won't load test/) or running the tests (it will load test/).

So, it's that easy. You are free, of course, to put the tests wherever you like. But this is my recommendation!

If you're getting into testing in Clojure, you should check out LispCast Intro to clojure.test. It's an interactive course. It has animations, screencasts, exercises, code samples, and text.

You might also like


  1. Clojure file names have to replace - (hyphens) with _ (underscores) to be compatible with Java.

Clojure Web Security

April 05, 2014

Summary: Use the OWASP Top Ten Project to minimize security vulnerabilities in your Clojure web application.

Aaron Bedra gave a very damning talk about the security of Clojure web applications. He went so far as to say that Clojure web apps are some of the worst he has seen. You should watch the talk. He has some good recommendations.

One of the jobs of web frameworks is to handle security concerns inherent in the web itself. Because most Clojure programmers build their own web stack, they often fail to look at the security implications of their application. They do not protect their site from even the easiest and most common forms of vulnerabilities. These vulnerabilities are problems with the way the web works, not with the particular server technology, yet it has become the server's responsibility to mitigate the vulnerabilities. Luckily, the vulnerabilities are well-studied and there are known fixes.

The Open Web Application Security Project (OWASP) does a very good job of documenting common web vulnerabilities and providing good fixes for them. They have a project called the Top Ten Project which every web developer should refer to regularly and use to improve the security of their app. You should also run through the Application Security Verification Standard checklists to audit your code. But the Top Ten should get you to understand the basics.

Warning: I am not a security expert. You should do your own research. The code I present here is my own interpretation of the OWASP recommendations. It has not been audited by experts. Do your own research!

Also, security is an ongoing concern. If you have any comments, suggestions, or questions, please bring them up!

Here is the Top Ten 2013 with a small breakdown and a Clojure solution, if applicable.

A1. Injection

If a server accepts input from the outside and then parses and interprets that input as a scripting or query language, it is open to attack. The most common form is SQL Injection, where an input form is posted to the server, the value of that form is concatenated into a string to make a SQL statement, and then the SQL statement is sent to the database to be executed. What happens if a malicious user types in "'; DELETE FROM USERS;"?

My preferred solution to SQL Injection in Clojure is to always use parameterized SQL statements. clojure.java.jdbc, supports these directly. The parameters will be escaped, making injection impossible.

Another problem is if you want to read in some Clojure data from the client, and you call clojure.core/read-string on it. read-string will execute arbitrary Java constructors. For instance:

#java.io.FileWriter["myfile.txt"]

This will create the file myfile.txt or overwrite if it already exists. Also, there is a form (called read-eval form) to execute code at read-time:

#=(println "Hello, vulnerability!")

Read in that string, and it will print. Any code could be in there.

The solution is to never use clojure.core/read-string. Use clojure.edn/read-string, which is a well-documented format. It does not run arbitrary constructors. It has no read-eval forms.

Summary: Always use parameterized SQL and use clojure.edn/read-string instead of clojure.core/read-string on edn input.

A2 Broken Authentication and Session Management

Authentication

This is a big topic and I can't address it all here. Clojure has the Friend library, which is the closest thing we have to a de facto standard. My suggestion is simply to read the entire Friend README and evaluate whether you should use it. This is serious stuff. Read it.

Session Management

Ring provides a session system which is fairly good. It meets many of the OWASP Application Security Verification Standard V3 requirements. But it does not handle all of them automatically. You still need code audits. For instance, if you are logging requests, OWASP recommends against logging the session key. You must ensure that the session key is added after the request is logged.

The ASVS also recommends expiring your sessions after inactivity and also after a fixed period, regardless of activity. Ring sessions do not do this automatically (the builtin mechanism has no notion of expiration) and the default implementations of session stores will store and accept sessions indefinitely. A simple middleware will do the trick of expiring them in both cases:

(defn wrap-expire-sessions [hdlr & [{:keys [inactive-timeout
                                            hard-timeout]
                                     :or {:inactive-timeout (* 1000 60 15)
                                          :hard-timeout (* 1000 60 60 2)}}]]
  (fn [req]
    (let [now (System/currentTimeMillis)
          session (:session req)
          session-key (:session/key req)]
      (if session-key ;; there is a session
        (let [{:keys [last-activity session-created]} session]
          (if (and last-activity
                   (< (- now last-activity) inactive-timeout)
                   session-created
                   (< (- now session-created) hard-timeout))
            (let [resp (hdlr req)]
              (if (:session resp)
                (-> resp
                    (assoc-in [:session :last-activity] now)
                    (assoc-in [:session :session-created] session-created))
                resp))
            ;; expired session
            ;; block request and delete session
            {:body "Your session has expired."
             :status 401
             :headers {}
             :session nil}))
        ;; no session, just call the handler
        ;; assume friend or other system will handle it
        (hdlr req)))))

Set the HttpOnly attribute on the session cookie. Very important for preventing stealing of session ids from XSS attacks.

Do not set the Domain attribute, and do set the Path if you want something more restrictive than / (the Ring session default).

Do not set the Expire and Max-Age attributes. Setting them makes the browser store the session id on disk, which simply expands the number of ways an attacker can get ahold of it.

Change the session cookie name to something utterly generic, like "id". You don't want to leak more information than necessary about how your sessions work.

Use HTTPS if you can and set the Secure attribute of the cookie.

Do not use in-cookie sessions. In-memory are good but they can't scale past one machine. carmine has a redis-based session implementation.

Summary: Here's how I use Ring sessions (with carmine) based on these OWASP recommendations.

(session/wrap-session
   (wrap-expire-sessions
    handler
    {:inactive-timeout 500
     :hard-timeout 3000})
   {:cookie-name "id"
    :store (taoensso.carmine.ring/carmine-store redis-db
             {:expiration-secs (* 60 60 15)
             :key-prefix ""}) ;; leak nothing!
    :cookie-attrs {:secure true :httponly true}})

A3 Cross-Site Scripting (XSS)

Whenever text from one user is shown to another user, there is the potential for injecting code (HTML, JS, or CSS) that is run in the victim's browser. Imagine if Facebook allowed any HTML in the post submission form. A malicious user could add a <script> tag with some keystroke logging code. Anybody who viewed that post in their feed would also get the key logger installed. That would be bad.

XSS is common because of how easy it is to make an app that stores user input (from a form post) in a database, then constructs the page out of stuff from the database. If you're not extremely careful, you could create a place where people can exploit each other.

The solution is to only use scrubbed or escaped values to build HTML pages. Because HTML pages can include different languages (HTML, CSS, JS), text needs to be scrubbed differently in each context. OWASP has a set of rules to follow which will guarantee XSS prevention.

hiccup.util/escape-html (also aliased as hiccup.core/h) will escape all dangerous HTML characters into HTML entities. JS and CSS still need to be handled, and rules for HTML attributes need to be followed.

If you want to allow some HTML elements, you will need to do a complex scrub. Luckily, Google has a nice Java library that sanitizes HTML. Use it.

Summary: Validate and scrub input from the user and scrub/escape text on output.

A4 Insecure Direct Object References

This one is a biggie: each handler has to do authentication. Does the particular logged in user have access to the resources requested? There's no way to automate this with a middleware. But having some system is better than doing it ad hoc each time. Remember: an attacker can construct any URL, including URLs with a database key in it. Don't assume that just because a request contains a key, the user must have the rights to it.

Summary: Always check the authority of the requesting session before performing an action.

A5 Security Misconfiguration

This is about keeping your software up to date and making sure the settings of all software makes sense.

A6 Sensitive Data Exposure

Having data is risky. Don't let it leak out.

A7 Missing Function Level Access Control

Use an authorization system (Friend) and audit the roles used for access control.

A8 Cross-Site Request Forgery (CSRF)

Let's imagine you have a bank account at Bank of Merica. You just checked your balance and didn't log out. Then you go to some public forum, where someone has posted a cool file. There's a big download button. You click it, and the next thing you know, you're on your bank page and all of your money has been transfered out of your account.

What happened?

The download button said "Download" but it was really a form submit button. The form had hidden fields "to-account", and "amount". The action of the form was "http://www.bankofmerica.com/transfer-money". By clicking that button, the form was posted to the bank, and because you were just logged in, oops, it transfered all your money away.

The solution is that you only want to accept form posts that come directly from your site, which you control. You don't want some random person to convince people to click on other sites to be able to transfer people's money like that.

There are several possible solutions. One approach is to add a secret to the session and also insert that secret into every form. That is the approach taken by the ring-anti-forgery library.

The solution that I like is to do a double-submit. This means you submit a secret token in the cookie (sent with each web request) and in a hidden field in the form. The server confirms that the cookie and the hidden field match. But the hidden field in the form is added by a small Javascript script which reads it from the cookie. Browsers don't allow Javascript to read cookies from other sites, so you guarantee that they form was posted from your site.

There are three parts to the solution.

  1. Install a secret token as a cookie.
  2. Install a script to add the hidden field to all forms.
  3. Check that the field matches the cookie on POSTs.

Here is some code to do 1 and 3.

(defn is-form-post? [req]
  (and (= :post (:request-method req))
       (let [ct (get-in req [:headers "content-type"])]
         (or (= "application/x-www-form-urlencoded" ct)
             (= "multipart/form-data" ct)))))

(defn csrf-tokens-match? [req]
  (let [cookie-token (get-in req [:cookies "csrf"])
        post-token   (get-in req [:form-params "csrf"])]
    (= cookie-token post-token)))

(defn wrap-csrf-cookie [hdlr]
  (fn [req]
    (let [cookie (get-in req [:cookies "csrf"]
                         (str (java.util.UUID/randomUUID)))]
      (assoc-in (hdlr req) [:cookies "csrf"] cookie))))

(defn wrap-check-csrf [hdlr]
  (fn [req]
    (if (is-form-post? req)
      (if (csrf-tokens-match? req)
        ;; we're safe
        (hdlr req)
        ;; possible attack
        {:body "CSRF tokens don't match."
         :status 400
         :headers {}})
      ;; we don't check other requests
      (hdlr req))))

The Javascript should be something like this:

(def csrf-script "(function() {
  var cookies = document.cookie;
  var matches = cookies.match(/csrf=([^;]*);/);
  var token   = matches[1];
  $('form').each(function(i, form) {
    if(form.attr('method').toLowerCase() === 'post') {
      var hidden = $('<input />');
      hidden.attr('type', 'hidden');
      hidden.attr('name', 'csrf');
      hidden.attr('value', token);
      form.append(hidden);
    }
  })
}());")

You should add it to all HTML pages. Note that this example script requires jQuery. Put it right before the </body>.

[:script csrf-script]

The nice thing about this solution is that it is strict by default. If you don't include the script, form posts won't work (assuming wrap-check-csrf is in your middleware stack).

Summary: CSRF attacks take advantage of properties of the browser (instead of properties of your server), so their defense can largely be automated.

A9 Using Components with Known Vulnerabilities

Software with known vulnerabilities is easily attacked using scripts. You should ensure that all of your software is up-to-date.

A10 Unvalidated Redirects and Forwards

One common pattern for login workflow is to have a query parameter that contains the url to redirect to. Since it's a user parameter, it's open to the world and could be a doorway for attackers.

For example, let's say someone sends an email to someone asking them to log in to their bank account. In it, there's this link:

http://www.bankofmerica.com/login?redirect=http://attackersite.com

What happens when they click? They see the legitimate site of their bank, which they trust. But it redirects them to the attacker's site, which has been designed to look like the bank site. The user might miss this change of domains and unwittingly reveal private information.

What can you do?

OWASP recommends never performing redirects, which is impractical. The next best thing is to never base the redirect on a user parameter. This would work, but puts a lot of trust in the developers and security auditors to check that the policy is enforced. My preferred solution allows redirects that conform to a whitelist of patterns.

(def redirect-whitelist
  [#"https://www.bankofmerica.com/" ;; homepage
   #"https://www.bankofmerica.com/account" ;; account page
   ...
  ])

(defn wrap-authorized-redirects [hdlr]
  (fn [req]
    (let [resp (hdlr req)
          loc (get-in resp [:headers "Location"])]
      (if loc
        (if (some #(re-matches % loc) redirect-whitelist)
          ;; redirect on our whitelist, it's ok!
          resp
          ;; possible attack
          (do
            ;; log it
            (warning "Possible redirect attack: " loc)
            ;; change redirect back to home page
            (assoc-in resp [:headers "Location"] "https://www.bankofmerica.com/")))
        resp))))

Summary: Redirect attacks can largely be avoided by checking the redirect URL against a whitelist.

Conclusion

Web security is hard. It takes education and vigilance to keep our servers secure. Luckily, the main security flaws of the web are well-understood and well-documented. However, this is only half of the work. These need to be translated into Clojure either as libraries and simply as "best practices". Further, these libraries and practices need to be discussed and kept top-of-mind.

If programming the web in Clojure interests you, you might be interested in my Web Development in Clojure video series. It covers all of the basics of web development, building a foundation to understand the entire Clojure web stack.

You might also like

Complex Syntax

August 19, 2014

Summary: Lisps are revered for their simple syntax, but parens are complex. They complect function calls and macro calls, which have drastically different semantics.

One of the problems that people have with Lisps is that they hate the parentheses. Clojure does a pretty good job of minimizing unnecessary parens and giving them a much clearer meaning. But there's a deeper problem that people express all the time when they're first learning. It's frustrating to watch people struggle with it, because it's not their fault. It's a problem with Lisps in general.

Parens in all Lisps I've seen, including Clojure, are complex. I'm not using the word lightly. Parens complect two similar but distinct ideas: macro application and function application.

Macros and functions are obviously different. Macros are expanded at a time just before compilation called "macro-expansion time". They typically cannot be accessed at runtime. Functions, on the other hand, are applied at runtime. And they are first-class, meaning they are runtime values. In addition, the calling semantics are different. Macros are call-by-name. The code of each gets passed unevaluated. Functions are call-by-value. Functions and macros are two distinct species.

However, despite being distinct semantics, the syntax for calling the two is identical. Parens complect applying macros with applying functions. Beginners trip up on this all the time. Their head is already spinning from the notion that some of the things they are learning are macros, called at compile time. Now add on top that the syntax of the language does not help one bit in distinguishing macro calls from function calls. You just have to memorize what's a macro and what's a function.

We learned in The Next 700 Programming Languages that our syntax should serve to elucidate the semantics. Lisp just fails at this pretty hard. The only consolation is that you actually can remember, with time and experience, what's a macro and what's a function. Every Lisp programmer is proof of that.

A simple solution would be to have a weird syntax for calling macros. You know, instead of parens, you use something else. Something that distinguishes the two to decomplect them. This would have broad and deep implications for the language that I cannot begin to fathom.

The takeaway for the beginner is that, sorry, Clojure won't help you much with this, but it's very important to know what's a macro and what's a function. You just have to keep track in your head. If you're not sure, you can call clojure.repl/doc1 on any symbol. If it names a macro, it will tell you.

So, there you have it. Lisps complect function calls and macro calls, which have drastically different semantics, using the same notation. Common Lisp and Scheme use parens for much more than that, making the syntax complex and context-dependent2. Clojure removes a lot of those parens, replacing them with square braces or removing them altogether. However, the complexity of macro and function calls remains.

Despite this, Clojure is still a great language! If you'd like to learn Clojure, I have to recommend the LispCast Introduction to Clojure video series.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like


  1. That's a macro.

  2. For instance, inside of a let, parens take on the meaning of grouping the bindings and also grouping the variable with its value.

The conj Mental Bump

March 23, 2015

Summary: conj can be confusing if you're used to other languages. It is not a commonly defined operation. Most languages define common positional adding operations. conj, however, has more useful semantics. How do you use conj usefully if you can't guarantee the position?

When I was at university, we were taught object-oriented programming as it exists in Java. We learned about interfaces, inheritance, and the Liskov Substitution Principle. It makes sense. If you're claiming that you've got a (sub)type of car, it still has to do everything a car can do. Otherwise it's not really a subtype of car.

The point of confusion

Whenever I'm teaching someone Clojure, there's a point in the journey where everyone gets at least curious, if not outright confused.

(conj '(1 2 3) 4) ;;=> '(4 1 2 3)

vs

(conj [1 2 3] 4) ;;=> [1 2 3 4]

What's the deal? Does conj add to the beginning or the end? What possible contract could allow both of these behaviors?

Then I show them that the confusion goes even deeper:

(conj #{1 2 3} 4) ;;=> #{1 4 2 3} ;; or some other random order

and

(conj {1 2 3 4} [5 6]) ;;=> {1 2 3 4 5 6}

What gives? How is this even possible? Do hashmaps and linked lists even share a common class ancestor?

The answer is, sure, if you want them to. The student must get over a tiny conceptual bump. And once that bump is surmounted, poof, a new way of seeing is discovered.

The bump

Let's get over that bump right here, right now.

Let's imagine a traditional collection interface, let's call it List. It has two methods, addFirst and addLast that add new elements. So you write an algorithm that adds a bunch of items to the end with addLast. It takes a List as argument, because that's the least subtype you need to perform the algorithm.

You call that algorithm with an ArrayList, which has the nice property that addLast is constant time. Woohoo! Your algorithm is fast and great.

A few months later, you get a phone call from another developer. He's complaining that he used your routine and can't figure out why it's so slow. It was working fine for a while, but as the users generated more records in the database, the routine was grinding to a hault.

You check out the code and immediately see the problem: the database query was returning not an ArrayList but a LinkedList. The implementation of addLast on LinkedLists is actually linear. Adding a bunch of stuff to the end was turning into a quadratic operation.

Let's say that again: even though the location semantics of the operation were the same, addLast on one had constant time and on the other had linear time. They both gave equivalent lists, but one of them was too slow. Does this satisfy the Liskov Substitution Principle? In practice, can you really substitute one for the other? Algorithmic complexity matters.

Clojure avoids that mess (while swapping it for another, which I'll get to shortly). It defines conj, which means not "put this at the beginning" or "put this at the end", but "hey, collection, you know yourself better than I ever can. Please add this wherever it makes sense for you as long as you do it in constant time. Thanks."1

Practically, that means that conj on LinkedList adds to the front, because that's constant time. And conj on ArrayList adds to the end. But, because the operation doesn't talk about order, like addFirst and addLast do, you can now extend conj to Set and even Map if you consider key/value pairs as single items. And that means that linear algorithms using conj will remain linear regardless of which collection you use.

The mess that Clojure chooses over the other mess

Does this satisfy the Liskov Substitution Principle? Well, that depends on how you look at it. You certainly don't guarantee that you get the same or even equivalent answers out. Consider this:

(def a [1 2 3])
(def b '(1 2 3))

(= a b) ;;=> true
(= (conj a 4) (conj b 4)) ;;=> false

So, here, performing the same operation on two equal values does not give equal results. That's kind of hard to reason about. But it's a similar tradeoff that you see with other operations that don't guarantee order. For instance, imagine two sets a and b.2

(= a b) ;;=> true

(= (seq a) (seq b)) ;;=> could be false!

The order of most sets is not guaranteed! This means that Clojure has some operations that do not maintain equality. conj just happens to be one of them.

What's the point?

So Clojure does not provide add operations that guarantee order regardless of collection type. Fine. What's the point?

The point is that, in practice, conj is more useful than addFirst and addLast combined. By defining a function using conj, it will work on a broader number of collections. It might give different answers for each, but it won't explode on one and do fine on the rest. And often the answers it gives are just fine. A basic version of into can be defined very easily. It works on all collections (for both to and from).

(defn into [to from]
  (reduce conj to from))

Common usage

One last thing before I wrap up: because the collection itself defines where the item will be added, I often find myself choosing the collection based on where I need it. A common idiom in Common Lisp was to make a new list by consing onto the front, then reversing it at the end because you really wanted them in the other order. In Clojure, there's no need, because you can just use a vector (and use conj). As long as the vector is local to the algorithm, it's not part of the contract, so it's your choice.

Conclusion

Java was wrong. addFirst and addLast cannot be substituted in LinkedList and ArrayList. They have different algorithmic complexities and at some point one's performance will be totally unacceptable. The operation that does allow for substitutibility in algorithm complexity is conj, which is always constant time. But then it doesn't maintain equality. However, I find that conj is way more natural and helps algorithmic reasoning more than guaranteeing where the item is placed.

If you'd like to learn Clojure, I recommend my video course LispCast Introduction to Clojure. It's a great introduction to the language using animations, exercises, and screencasts. It's designed to give a deep dive straight to what makes Clojure interesting. It begins with syntax, goes through functional programming, and ends with data-driven programming.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like


  1. It's even polite!

  2. (def a (set (range 100))) (def b (apply sorted-set (range 100)))

Convince your boss to use Clojure

September 16, 2014

Summary: Clojure has been successfully adopted by many companies. There are many resources available by people who did the hard work of introducing Clojure to their team.

Do you want to get paid to write Clojure? Let's face it. Clojure is fun, productive, and more concise than many languages. And probably more concise than the one you're using at work, especially if you are working in a large company. You might code on Clojure at home. Or maybe you want to get started in Clojure but don't have time if it's not for work.

One way to get paid for doing Clojure is to introduce Clojure into your current job. I've compiled a bunch of resources for getting Clojure into your company.

Take these resources and do your homework. Bringing a new language into an existing company is not easy. I've summarized some of the points that stood out to me, but the resources are excellent so please have a look yourself.

The Strategy

Before you begin your quest to introduce Clojure, you're going to need a good strategy. By far the best presentation of a strategy is by Neal Ford. Neal Ford is a Director at ThoughtWorks and has a great strategy for introducing Clojure into an existing company. Watch this video.

  1. Spread Clojure outside of the company.
  2. Get a groundswell of people inside the company.
  3. Use Clojure for things it's great at.
  4. Get the Clojure jar file included.

If your company happens to be using Ruby, Joshua Ballanco has some great tips for How to Sneak Clojure Into Your Rails Shop.

Sean Corfield helped move a sizeable legacy application to Clojure. He's got some good, sobering advice.

  1. Be ready to explain the lack of a framework.
  2. OOP habits are ingrained.
  3. Don't underestimate the difficulty.

A lot of great advice from someone who's actually done it.

  1. Find allies.
  2. Answer the questions.
  3. Take responsibility.
  4. Get help.
  5. Be an advocate.

Some great advice from Logan Campbell, someone who convinced his coworkers to use Clojure at a Post Office.

  1. Be positive: if they say "We need static typing", say "Great! Clojure has that!" (which it does with Typed Clojure).
  2. Show them working code.
  3. Be ready for performance questions.

Material for other developers

Prismatic has been using Clojure to great success. They've written about how Clojure is used throughout their stack. This is a great introduction to answer the question "Why Clojure?". Spread this post whenever anyone asks why?.

Leo Polovets polled Clojurists at Factual and summarized their answers to Why Clojure?.

A while ago, this great post was trending on Hacker News. It explains Why Clojure?.

Material for the project manager

If you'd like a high-level overview of the business advantages to using Clojure, you can do a lot worse than asking Cognitect, the company that develops Clojure itself. They've published a case study, meant for non-tech folks, to understand the implications. It's focused mainly on Datomic, but it touches on Clojure.

Though a little hyperbolic, this post is a good one for the skeptical manager, the one who wonders whether their team can really learn a new technology quickly enough to justify the cost.

For those who like to follow industry trends and what others are recommending, look no further than the ThoughtWorks Technology Radar. It's a compendium of recommendations, published regularly, that takes a realistic view of a constantly changing landscape. Clojure has been rated at Adopt since October 2012. A lot of secondary Clojure technologies are also on the radar, including core.async, om, Datomic, and ClojureScript.

Documentation, Training, Support

I mainly want to show that there's plenty out there and plenty of new stuff coming out, not recommend anything specific.

Books

There's plenty going on with documentation. There are many books available on Amazon. These are all of the same quality as any enterprise Java book.

Videos

Videos are an up-and-coming type of training, but there's plenty out there.

Training

Besides these courses, there are often Clojure courses before or after the Clojure conferences.

Support

Besides the normal IRC (#clojure on freenode), Google Group, and Jira, Cognitect offers support.

Conclusion

Clojure is gaining traction. It's fun, it's productive. But it's still a little fringe in larger companies. Though it will still take a lot of work, these resources should help you make a case for Clojure. It's my mission to help people thrive with Clojure. If you'd like to keep up to date on what's happening in the Clojure world, you may be interested in getting the Clojure Gazette for free. Sign up here.

For more inspiration, history, interviews, and trends of interest to Clojure programmers, get the free Clojure Gazette.

Learn More

Clojure pulls in ideas from many different languages and paradigms, and also from the broader world, including music and philosophy. The Clojure Gazette shares that vision and weaves a rich tapestry of ideas from the daily flow of library releases to the deep historical roots of computer science.

You might also like