Keyword: concurrency

Atom code explanation

August 28, 2014

Summary: I go over a real-world example of how atoms and immutable values allow you to compose constructs in ways that are easy to reason about and less prone to error.

The other day I was in IRC #clojure and someone asked a good question. They had code like the following, and they couldn't understand why they couldn't modify a map.

(def state (atom {}))

(doseq [x [1 2 3]]
  (assoc @state :x x))

(println @state)

What does this print? Well, the asker wanted it to print {:x 3}. But it printed {}. To understand what's happening, let's go step by step.

{} creates an empty map. It's literal syntax for a constructor for a map. This one happens to be empty.

(atom {}) takes the empty map that was just created and passes it to the function atom, which constructs a new clojure.lang.Atom. Atoms are objects, and its current state is the empty map we just passed in.

(def state (atom {})) defines a new var called state in the current namespace.

At this point, we've got a variable called state whose value is an atom that holds an empty map.

(doseq [x [1 2 3]] loops over the numbers 1, 2, and 3. x will be bound to each of those numbers, in turn.

@state gets transformed into (deref state), which returns the current value of state. :x is a literal keyword, and x is a reference to the x bound inside the loop.

(assoc @state :x x) creates a new map by taking the current value of state (which happens to be {}) and associating :x with x (which will be 1, 2, and 3 as the loop happens). The value is returned by assoc, and then thrown away, since it isn't bound to anything.

Then (println @state) will print the current value of state, which still is {}.

This code shows a common problem that beginners face in Clojure: how do immutable data structures (like maps) and the concurrency primitives (like atom) work together to manage state?

The answer is quite simple (in the Rich Hickeyan sense) and elegant. By separating the ideas of value and state, Clojure has made it easy to express precisely the behavior you want in concurrent systems.

The value is the map. It is immutable. It cannot change. It is a single value, and it will always be the same. That means threads can share the value with no worries that one of them will change it.

The state is the atom. It's a mutable object. And being an object, it has methods that define its interface. In the code above, we saw that you can call deref on an atom to get its current value. deref is basically a getter.

The main way to change the value of an atom is using swap!. swap! takes an atom and a function (plus optional arguments) and calls the function on the current value of the atom. It then sets the value of the atom to the return value of the function. So let's use that to fix the code.


(def state (atom {}))

(doseq [x [1 2 3]]
  (swap! state assoc :x x))

(println @state)

swap! takes the atom (state) and a function (assoc) and some arguments (:x x). It calls assoc on the current value of state with those extra arguments and sets the value of the atom to the return value of the function.

The swap! expression is almost (but not) the same as this code:


(reset! state (assoc @state :x x)) ;; never do this

reset! changes the state of the atom but without regard to the current value. This new code is bad because it's not thread-safe. Use swap! if you need to use the current value to determine the new value.

So what does an atom do? What does it represent?

Atoms guarantee one very important thing: that each state is calculated from the last state. The swap! operation is atomic. No matter how many threads are trying to change the value, each change is calculated from the previous value and no previous values are lost. That's its contract as an object and it's one of the important ways that Clojure helps with concurrency.

How can a value be lost?

If we have two threads, each trying to change state in the same incorrect way (using reset!), the order of evaluation will have several steps:

(deref state) ;; call this value *1
(assoc *1 :x x) ;; call this value *2
(reset! state *2)

Because the threads are running concurrently, the operations have a chance of interleaving their steps in unwanted ways. For instance, threads A and B might interleave like this:

A: (deref state) ;; call this value *1A
A: (assoc *1A :x x) ;; call this value *2A
B: (deref state) ;; call this value *1B
B: (assoc *1B :x x) ;; call this value *2B
B: (reset! state *2B)
A: (reset! state *1A)

What happened? On line 6, A set the value of state to the value it calculated on line 2. So B's work is completely discarded. That's probably not what was intended. What's worse is that that is one of many possible interleavings, some of which work and some don't. Welcome to concurrency!

What you probably wanted was to make sure that no work is discarded. You want the operation to be atomic. That's why it's called an atom. swap! is atomic. A swap! to an atom occurs "all at once", instead of on three lines like the reset! example. If two threads are doing swap!, there are two possible interleavings.

A: (swap! state assoc :x x)
B: (swap! state assoc :x x)

And

B: (swap! state assoc :x x)
A: (swap! state assoc :x x)

These are usually what you want. If only one or neither one works, atom is not the right construct for you.

So there you go. Atomic mutable state with immutable values gives you a nice, composable concurrency semantics. You could do it with locks but it's harder to ensure you're doing it correctly. It's slightly higher-level than locks yet it provides tremendous value. Atoms are easier to reason about and less prone to errors.

If you'd like to learn the basics of Clojure, I recommend my video course called LispCast Introduction to Clojure. I don't go over concurrency, but you will learn lots of functional programming. Go check out the description to see if it's right for you.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises

Learn more

You might also like

Pre-West Prep: Leon Barrett

April 04, 2015

Talk: Clojure Parallelism: Beyond Futures

Leon Barrett's talk at Clojure/West is about parallelism in Clojure.

Background

Clojure is well known for its parallel programming super powers. Immutable data structures, concurrency primitives, and a few convenient constructs like future and pmap have been there since the beginning. But what's even cooler is how people have been able to build on the strong foundation Clojure established to create new parallel abstractions. Leon Barrett will talk about some of these. The description mentions reducers, tesser, and claypoole.

Rich Hickey gave a talk about reducers back in 2012, focusing on the ideas and abstractions they are based on. A more practical talk was given by Renzo Borgatti at Strange Loop 2013. Kyle Kingsbury gave a talk about tesser, a library which extends Clojure's parallel abstractions to execute in a distributed manner. And Leon Barrett himself wrote a recent blog post about Claypoole.

About Leon Barrett

Homepage - GitHub - Google+

This post is one of a series called Pre-West Prep, which is also published by email. It's all about getting ready for the upcoming Clojure/West, organized by Cognitect. Conferences are ongoing conversations and explorations. Speakers discuss trends, best practices, and the future by drawing on the rich context built up in past conferences and other media.

That rich context is what Pre-West Prep is about. I want to enhance everyone's experience at the conference by surfacing that context. With just a little homework, we can be better prepared to understand and enjoy the talks and the hallway conversations.

Clojure/West is a conference organized and hosted by Cognitect. This information is in no way official. It is not sponsored by nor affiliated with Clojure/West or Cognitect. It is simply me (and helpers) curating and organizing public information about the conference.

You might also like

Willy Wonka and the core.async Guidelines

October 10, 2014

Summary: There are a few conventions in core.async that are not hard to use once you've learned them. But learning them without help can be tedious. This article presents three guidelines that will get you through the learning curve.

Willy Wonka, inventor of CSP.

Introduction

The more you use core.async, the more you feel like Willy Wonka. He knew how to maximize the effectiveness of the Oomploompa. And while core.async comes with a lot of functions built in, he knew exactly which ones to use at which time.

In this extremely rare glimpse into the functioning of his mysterious factory, we take a look at the guidelines Wonka himself follows when orchestrating the work of the Oompaloompas.

When to use `go` versus `thread`?

Willy Wonka with his Thread Pool.

Background

Each Oompaloompa is a thread. Willy Wonka has a special group of Oompaloompas he calls a thread pool. Their assigment is simple: they manage a group of tasks that Wonka calls go blocks. Whenever Wonka has an appropriate task, he writes a go block and hands it to the Oompaloompas to work on.

As the Oompaloompas work, they take one task and do it until the task parks. When it parks, they put it down and pick up another task that isn't parked. Tasks become unparked when they get new input from the chocolate pipes. Then the Oompaloompas can continue working on them.

At one time, Wonka used to give the thread pool all sorts of tasks. He would give them very long calculation tasks, like weighing each chocolate bean in his chocolate bean mountain. He noticed that when they did this, lots of tasks were left undone, even though they were not parked, because all of the Ooompaloompas were busy doing something else.

So he came up with a guideline.

Avoid long calculations and blocking inside `go` blocks

Does your code do significant I/O, like downloading a file or writing to the network? Are you doing a very long calculation?

Then use a thread. If it will take a long time or block, you want a dedicated thread. It can work as long as it wants, and even block. That way it doesn't slow down the work of the thread pool.

Otherwise, you can use a go block.

When to use single- versus double-bang (!)

A couple of blocked Oompaloompas.

Background

Wonka also noticed that he needed to write different instructions for his two types of Oompaloompa. When he wrote a go block, he needed to say "park while you wait for input". But for the other Oompaloompas created with thread (or for his own work), he needed an instruction that said "block while you wait for input".

So he came up with a little notation convention. If you're just parking, so you're in a go block, use one bang. If you're outside of a go block, meaning you need to block, use two bangs.

These were his versions of his basic instructions:

>!, <!, and alts! versus >!!, <!!, and alts!!. The convention is easy.

Use single-bang versions in `go` blocks and double-bang versions outside.

The single-bang versions of these functions are meant to park a go block. Although they are defined as functions, they have special meaning to the go macro. In fact, if you actually run the functions (outside of a go block), they will throw an exception unconditionally, telling you they are meant to be inside a go block.

The double-bang versions are blocking. That means that the thread they are running on will block if the channel is not ready. They can be used outside of a go block (anywhere) or inside of a thread block. It's safe to block inside a thread block since it's a dedicated thread.

`put!`

Willy Wonka writing instructions for his mailman.

Background

Like all factories, Willy Wonka's needs deliveries. When the UPS truck comes, there's plenty of boxes to unload. But Wonka is busy. So he leaves a note outside for the delivery guy.

The note tells the guy where to put everything so the Oompaloompas know where to find it. When he says where to put a box, he spells it put!. That is, it has a bang.

It's unfortunate because the other functions with a bang mean they park. But put! does not park. Wonka was just angry one day, and the convention stuck.

But the delivery guy knows that Wonka is eccentric, so he doesn't take it personally and does his job. He puts stuff in its places, without blocking.

Use `put!` to get stuff into your channels from outside.

put! is a way to get values from outside of core.async into core.async without blocking. For instance, if you're using a callback-style, which is very common in Javascript, you will want to make your callback call put! to get the value onto a channel.

Conclusion

That's it! Now to eat some chocolate!

core.async is really cool, but it has a learning curve. Once you learn these conventions, you will begin to feel the power they give you, whether you're making chocolate or building cars. If you'd like to learn core.async and feel like Willy Wonka, I recommend the LispCast Clojure core.async videos. They build up a deep understanding of the fundamental concepts in a fun and gradual way.

Atom code explanation

You might also like

Pre-West Prep: Leon Barrett

Talk: Clojure Parallelism: Beyond Futures

Background

About Leon Barrett

Pre-West Prep Email Signup

You might also like

Willy Wonka and the core.async Guidelines

Introduction

When to use go versus thread?

Background

Avoid long calculations and blocking inside go blocks

When to use single- versus double-bang (!)

Background

Use single-bang versions in go blocks and double-bang versions outside.

put!

Background

Use put! to get stuff into your channels from outside.

Conclusion

Get a Free core.async Reference Sheet

You might also like

When to use `go` versus `thread`?

Avoid long calculations and blocking inside `go` blocks

Use single-bang versions in `go` blocks and double-bang versions outside.

`put!`

Use `put!` to get stuff into your channels from outside.