Summary: Javascript's concurrency model forces code to give up control of when a callback will be called. core.async gives you back that control and hence lets you code in a more natural style.
Well, there comes a time in every programmer's life when they take a look at the ThoughtWorks Technology Radar and they realize that core.async is in the Trial circle, meaning you should see if you might want to use it.
And if you're there, right there in that phase of your programming trajectory, eyeballing core.async for your next (or current) project, Welcome. This post is for you. Here it goes.
Why core.async? Well, the short answer is that it makes concurrency much, much, much, very much easier. I mean, let's face it: concurrency is so hard by itself, it has plenty of muches to spare. Now, I haven't used core.async a lot on the JVM. I wrote some, but it wasn't really the right thing for it. I plan on writing more later, I just haven't had the right project for it.
But I have used it a lot in ClojureScript in browsers. 1 And it is nice. It lets you do things that you could write yourself, given enough time. But you're more likely to solve the 16-ring Tower of Hanoi before you get all the kinks out. It's much better to let a machine do the hard work. That's what the 20th Century was all about: machines instead of muscle. And the 21st Century will be about computers instead of brains. Best get ahead of the curve.
I say you should let the machine do the work, but maybe that's too vague. Let's look at a concrete example. First, how do you do an ajax request then do something with the value? Easy:
We're in Javascript, so we have to pass a callback which will get the result. That was easy. A little harder is making two API calls and doing something with both results.
Alright, that wasn't too bad. A little indentation never hurt anyone. But, wait a second! We don't do the second request until the first request is already done. I've got a browser the size of a minivan and a 20 Megabit internet connection, and I'm doing one request at a time? That sucks!
We could start them both at the same time. But what order will they come back in? Welcome to the world of concurrency!!!! Things happening (maybe) at the same time, or at least you don't know what order they will happen in!
Well, let's try something. What if the first one to finish wrote its result down, then the second one to finish would know that it was second and it could do the final calculation? What would that look like?
Ok, well, that should work. What happens if you have to do 3 AJAX requests? Not so bad, either. What about 17? Oh, man, that sucks. We could do something like make a super-promise, where you can promise many values and only call a function at the end when they're all there. Yes, you can do that. It really wouldn't be hard, even.
(defn super-promise
"Create a promise for many values. Use `deliver`
to add values.
keys: all of these keys must be present before calling f
f: the function to call. Will be passed a map."
[keys f]
(let [r (atom {})]
(add-watch r :promise
(fn [_ _ _ s]
(when (every? #(contains? s %) keys)
(f s))))
r))
(defn deliver [promise key value]
(swap! promise assoc key value))
(def rs (super-promise [:r1 :r2]
(fn [{:keys [r1 r2]}]
(js/console.log (/ (:n r1) (:n r2))))))
(ajax "http://example.com/random-number"
#(deliver rs :r1 %))
(ajax "http://example.com/non-random-number"
#(deliver rs :r2 %))
Fhew! That's done. It works. It scales to many simultaneous AJAX calls. It's generic. Well, generic for this particular pattern. If we have a different pattern, we'd have to come up with a different solution.
We're looking through a small porthole into callback hell. The identifying characteristic of callback hell is that you give over control from your code, which was all nice and procedural and easy to follow, you give the control over to whatever demon is going to call that callback. You sell your virtual soul for a bit of asynchrony. But you can't cheat the Devil. When all is said and done, all of your work gets done but you need some savior angel to help you coordinate all of the pieces back together again. In this case, it's the super-promise, which works in the first circle of hell, but even Dante can't help you if you go further.3
Now that we've got a decent solution to this particular problem established pre-core.async, let's look at what it would be using core.async. We'll assume that our ajax-channel function returns a core.async channel.
Let me just get it out of the way and never mention it again: it's shorter. It's shorter even than the naive solution using two atoms. And it's shorter than the super-promise solution even if you don't include the super-promise code. I'm done talking about the size, because it's only a little important.
Now that that's out there, on to the more significant stuff. First and foremost is that you never lose control. The code even reads procedurally. Start two ajax requests and remember the channels. Start a go block (which means run the code asynchronously) and log the result of dividing the first result by the second result.
Does it scale? You betcha! Imagine we need to make 192 imaginary AJAX calls before the Devil takes his due. The only way to do that is to do them all as fast as the browser fairies let you.
The AJAX requests come back as fast as they can (meaning arbitrary order), and the results are logged in their original (numeric) order. You could do them in any order you want. That's because you're not giving up control.
How does this work? How can you have asynchrony and not give up control?
I alluded to it before: you're making the machine do the work. That go block up there is actually a powerful macro that transforms your procedural code into a mess of callbacks (like in our super-promise example) that you would never want to write yourself. I mean, maybe you want to, but maybe you're nuts. And you'll get it wrong.
The transformation in the go block is pretty easy, as things go. It's mechanical. It's easy like lifting a car with your hands. Put enough leverage (by using a jack) and you can do it. It converts an easy motion (pushing down on the lever or turning the screw) into a powerful force. The go macro converts your easy code into a bunch of callbacks and coordinates them with a powerful state machine which will angelically reassemble them without ever losing control.
It's all good-ol' callbacks and mutable state underground. But above ground, you've got code that's easy to reason about. No Devil's bargain. You've got an angel negotiating for you. That's the key thing! Channels are amazingly easy to reason about because each channel is so simple. But that's a story for another day!
I should just mention that, yes, core.async is about procedural programming. Channels are mutable state. core.async is made for the small part of your code that is procedural and side-effecting. Every program has got such a part. If you're doing concurrent things (and in Javascript, you always are), core.async might be able to help provide a first-class mechanism for communication and coordination.
That's what you might call the "core" of core.async in ClojureScript. It's about regaining control of your asynchronous calls and not smearing your logic across your code in little bits contained in callbacks. You keep your code's semantic integrity and you keep your sanity.
If staying out of callback hell is to your liking, you just might like the divine help of a LispCast video course dedicated to teaching core.async in a gentle, graceful way. Presented in a unique visual format and at just the right pace, LispCast Clojure core.async will guide you to a deep understanding of the fundamentals of core.async so you can clean up your code, get more concurrency, and get back control.
Summary: If your functions return core.async channels instead of taking callbacks, you encourage them to be called within go blocks. Unchecked, this encouragement could proliferate your use of go blocks unnecessarily. There are some coding conventions that can minimize this problem.
I've been using (and enjoying!) core.async for about a year now (mostly in ClojureScript). It has been a huge help for easily building concurrency patterns that would be incredibly difficult to engineer (and maintain and change) without it.
Over that year, I've developed some practices for writing code with core.async. I'm putting them here as an invitation for discussion.
Use callback style, if possible
A style develops when using core.async where you convert what would in regular ClojureScript be a callback style with return-a-channel style. The channel will contain the result of the call when it is ready.
Using this style to keep you out of "callback hell" is overkill. "Callback hell" is not caused by a single callback. It is caused by the eternal damnation of coordinating multiple callbacks when they could be called in any order at any time. Callbacks invert control.
core.async quenches the hellfire because coordinating channels within a go block is easy. The go block decides which values to read in which order. Control is restored to the code in a procedural style.
But return-a-channel style is not exactly free of sin. If you return a channel too much, the code that calls those functions will likely end up in a go block.
go blocks will proliferate. go blocks incur extra cost, especially in ClojureScript where they happen asynchronously, meaning at the next iteration of the event loop, which is indeterminately far away.
Furthermore, go blocks might begin nesting (a function whose body is a go block is called by another function whose body is a go block, etc), which is correct semantically but probably won't give you the performance you're looking for. It's best to avoid it.
"How?" you say? The most important rule is to only use core.async in a particular function when necessary. If you can get by with just a callback, don't use core.async. Just use a callback. For instance, let's say you have an ajax function that takes a callback and you're trying to make a small API wrapper for convenience. You could make it return a channel like this:
The interesting thing to note is that core.async is not being used very well above. Yes, you get rid of a callback, but there isn't much coordination happening, so it's not needed. It's best to keep it straightforward, like this:
You're just doing one bit of work here (basically constructing a URL), which is a good sign. But how do you "lift" this into core.async?
<<<
There's a common pattern in Javascript (not ubiquitous, but very common) to put the callback at the end of the parameter list. Since the callback is last, you can easily write something to add it automatically.
(defn <<< [f & args]
(let [c (chan)]
(apply f (concat args [(fn [x]
(if (or (nil? x)
(undefined? x))
(close! c)
(put! c x)))]))
c))
This little function is very handy. It automatically adds a callback to a parameter list. You call it like this:
This function lifts search-google, a regular asynchronous function written with callback style, into core.async return-a-channel style. With this function, if I always put the callback at the end, I can use my functions from within regular ClojureScript code and also from core.async code. I can also use any function (and there are many) that happen to have the callback last. This convention has two parts: always put the callback last and use <<< when you need it. With this function, I can reserve core.async for coordination (what it's good at), not merely simple asynchrony.
<convention
There are times when writing a function using go blocks and returning channels is the best way. In those cases, I've adopted a naming convention. I put a < prefix in front of functions that return channels. I tried it at the end of the name, but I like how it looks at the beginning.
(go
(js/console.log (<! (<do-something 1 2 3))))
The left-arrow of <do-something fits right into the <!. It also visually matches (<<< do-something 1 2 3), so it makes correct code look correct and wrong code look wrong. The naming convention extends to named values as well:
These conventions are a great compromise between ease of using core.async (<<<) and universality (callbacks being universal in JS). The naming convention (< prefix) visually marks code that should be used with core.async. These practices have taken me a long way. I'd love to discuss them with you here.
If you know Clojure and you are interested in learning core.async in a fun, interactive style, check out the LispCast Clojure core.async videos.
Summary: Conveyor belts are strikingly similar to Clojure core.async channels. While it could be a coincidence, there is speculation that conveyor belts were influenced by a deep understanding of core.async.
Who invented the conveyor belt? No one knows for sure. Many historians believe that it was Henry Ford, seeking to mechanize the transportation of car parts from one part of the factory to another. This conservative view is plausible. Henry Ford designed much of the assembly line in his factories.
Despite the simplicity of this explanation, a small minority of researchers believe that there was a man behind Henry Ford who was the true inventor of the conveyor belt. Not much is known about him, but historians have pieced together a different story. This article is about that story.
Henry Ford in front of his invention.
This is a classic picture of Henry Ford standing in front of his creation. Recent advances in digital imaging enhancement have allowed us to learn a little more about the people that influenced Ford who, until recently, have lived within his shadow.
Strange man who is suspected true inventor of conveyor belt.
While the identity of this man is not known, historians have reassembled a story from bits and pieces of Henry Ford's journals and quotes. It is believed (by those few, brave historians) that this man is the true inventor of the conveyor belt. He looks eerily similar to the famed inventor of Clojure, Rich Hickey. However, the chronologies do not match up, so this hypothesis is easily ruled out.
Conveyor belt similarities to core.async channels
It is well understood that the conveyor belt is a poor, physical compromise on the much more reasonable abstraction of the Clojure core.async channel. As lossy as the copy is, it is still quite useful. core.async channels and conveyor belts have many properties in common.
Things can be put onto and taken off of a conveyor belt
The similarity here is uncanny. It is almost as if the inventor of the conveyor belt had some knowledge of the core.async operations.
The conveyor belt serves as a buffer
Again, it is striking that the conveyor belt can serve nearly the same purpose as a core.async channel. On long conveyor belts, the people taking things off and putting things on might not even know each other. And for things that take an unknown amount of time, you can just point to the conveyor belt and say "Wait here for the thing you need." They don't have to know when or from where it is coming.
Things can be taken off in the same order they are put on
This is something that perhaps the conveyor belt has lost. While core.async channels always maintain their own semantics for choosing which thing is taken next, conveyor belts are basically dumb and expose all of their contents. However, it is clear that you could enforce the common "first-in-first-out" queue behavior by always taking the last item from a conveyor belt.
When the conveyor belt is full, the "putter" has to wait
What happens when the conveyor belt fills up? Well, you wait for some room. Everyone has had this experience where you're at the supermarket and the cashier is taking too long to ring up your food so the conveyor belt fills up. You just have to wait with a dumb look while they look up the code for celery.
When the conveyor belt is empty, the "taker" waits
Conversely to the full belt situation, when the belt is empty the taker must wait. This is what happens when the cashier is faster than the shopper at the supermarket. There's nothing to do but wait for some groceries to reach the end of the belt.
Differences between conveyor belts and core.async channels
Though the evidence that core.async influenced the development of the conveyor belt is overwhelming, there are some differences due to the physical limitations of time and space.
Non-instantaneous transfer
Though not truly instantaneous, core.async channels do operate at the speed of light. When a value is put onto a channel, it is immediately available to be taken. Contrast this with conveyor belts, which slowly move objects along a physical belt.
Limited in length
Conveyor belts have finite length. But core.async channels can be passed around just like any other object. They, in fact, are not limited by space at all. Also, the length of a conveyor belt is equivalent to its size as a buffer. But core.async channels have properly decomplected buffer size from "distance" of travel.
No dropping/sliding semantics
It is quite curious that this important feature of core.async channels has not been copied to conveyor belts. core.async channels can be set up to drop either the newest value added or the oldest value added. Conveyor belts that do that are considered defective. The best explanation is that physical goods are so expensive compared to computing ephemeral values that it is uneconomical to trow away any items after they are made.
The jury is still out as to whether core.async influenced the development of conveyor belts. Was Tony Hoare, who invented Communicating Sequential Processes (on which core.async is based) given the secret to CSP by this same man? This hypothesis becomes more convincing when we see the immense similarities between conveyor belts and core.async channels.
Here is a picture of Rich Hickey being picked up by a friend after Strange Loop 2014.
Rich Hickey being picked up by his friend after Strange Loop
If you would like to explore the mysteries of the strange and tangled similarities between conveyor belts and core.async, you might want to try LispCast Clojure core.async. It's a video series that introduces the concepts using animations, code, and exercises.
Summary: Elm is an exciting FRP language. I implemented the FRP part in Clojure using core.async.
I like to read research papers. I have ever since I was in high school. I've always wanted it to be pretty easy to just translate the pseudocode for an algorithm for a paper and then have it working without any trouble.
Of course, this rarely happens. Either the pseudo-code leaves out important details, or it's expressed in terms that are not available abstractions in any language I know. So then I am left to puzzle it all out, and who has the time?
A few months ago I read the fantastic thesis by Evan Czaplicki1 where he introduces Elm. It includes a novel way to express asynchronous Functional Reactive Programming (which he calls Concurrent FRP) and a way to express GUI layouts functionally.
I highly recommend the thesis. It is very clear and readable, and points you to valuable resources.
The reason I was reading it was because I think FRP has a bright future. It's definitely got potential for simplifying the way we write interactive applications. I wanted to learn how it works so that I could build bigger castles in the sky.
Elm FRP is pretty cool. It is, first of all, very few parts. You build bigger abstractions from those small parts. It is built so that events trickle through the graph in order. In other words, they're synchronized.
But the other interesting thing is that sometimes you don't want things to be synchronized. What if one of the calculations takes a long time? You don't want the GUI to stop responding to mouse events while that slow thing is happening. Elm lets you segment the graph into different subgraphs that can propagate concurrently.
Seriously, just read the thesis.
While I was reading it, I got to this page where he lays out all of the FRP functions in Concurrent ML. It's not uncommon for entire Computer Science theses to be written, passed, and published without a line of runnable code. But here it was, in real code (no pseudocode). I don't know ML, but I do know Haskell, which is related. And I started puzzling through it, trying to understand it.
Elm FRP in Concurrent ML
By flipping between the text and the code, I got it. And then I realized: everything I needed to write this was in Clojure with core.async. So I got to work.
It took a while of playing with different representations, but I got something I can use and that is pretty much a direct translation of the Concurrent ML code from the thesis. I toyed with some deeper factorizations, but I think they only muddy what's going on. And it runs in both Clojure and ClojureScript. It's available on Github.
And, of course, I needed to test my creation, so I ported the Mario example from the Elm website. You have to click on it to start it. Use the arrow keys to walk left/right and up to jump. It captures your keystrokes. Click outside of the box to get your scrolling keys back.
Now, I didn't write all of the cool GUI stuff from the second part of the thesis. I was learning Om so I decided to use that. That's why that part is so long. Writing Om components is basically as long as writing HTML + CSS.
And Elm is a language built around these FRP abstractions, so a lot of the signals are built in. I had to write event handlers to capture the mouse clicks and keyboard events. But right in the middle of mario.cljs, you'll see a very straightforward translation of the Mario example from Elm.
There are a few differences between the Elm version and the Clojure version. Clojure is untyped, so I was able to eliminate a channel used only to carry an input node id. That instead is passed along with the message on a single channel.
Also, I added a node type called "pulse" which I use for emitting time deltas for calculating physics. I'm not sure if Elm internally does something similar for its timers but it seems like the correct solution. The trick is you want to emit a zero time delta while other events are firing, and an actual time delta when the timer ticks.
Finally, instead of the graph being global as in the thesis, it's all tied to an "event stream", which is where you send the events which get channeled to the correct input signal. You can have multiple event streams and hence multiple graphs.
The implementation is very straightforward. You have to know the basics of Clojure core.async, plus mult and tap. core.async brings Clojure a new set of abstractions that let us tap into more research and learn from more languages. And that makes me happy :)
If you'd like to learn core.async, I have to recommend my own video course LispCast Clojure core.async. It teaches the most important concepts from core.async using story, animation, and exercises. It is no exaggeration to say that it was the most anticipated Clojure video when it came out. Go check out a preview.
Summary: Clojure core.async is a way to manage mutable state. Isn't that against functional programming?
When core.async was first announced, there was a lot of fanfare. But among the celebration, there was some consternation about core.async. Isn't core.async against the functional principles of Clojure? Aren't channels just mutable state? Aren't the <! and >! operations mutation?
Well, it's true. core.async is about mutation. It's procedural code. Go blocks run their bodies one step at a time. It's imperative.
But that's what Clojure is all about. It makes functional programming easy (with fns, immutable data structures, and higher order functions). It also makes mutable state easy to reason about. It does not eliminate it. It simply gives you better abstractions. That's what Atoms, Refs, Vars, and Agents are: useful abstractions for dealing with state.
core.async is just another abstraction for dealing with state. But, following the Clojure philosophy, it was chosen to be easy to reason about. The hardest part about coordinating and communicating with independent threads normally is that neither of them know what the other is doing. You can make little signals using shared memory. But those signals get complicated fast once you scale past two threads.
And that's what a channel is: it's just a shared coordination point. But it has some cool properties that make it super easy to reason about:
Carry semantics: the channel carries its own coordination semantics (buffering, unbuffered, etc).
Simple interface: channels have put, take, and close. That's it.
Very scalable: any number of processes can use a single channel with no additional cost.
Decoupling: consumers don't need to know producers and vice versa.
Channels are awesome, but they're not the whole story. The other part of core.async is the go block. Go blocks are another abstraction. They allow you to write code in an imperative style that blocks on channels. You get to use loops and conditionals, as well as local let variables, global variables, and function calls -- everything you're already using, but augmented with the coordination power of channels.
All of these features add up to something you can reason about locally. That's the key: the code you're looking at now can be understood without looking at other code.
But there's a downside: you now have more choices. In theory, they're easier choices. But that requires you to understand the choices. You need to understand the abstractions, the idioms, and the tradeoffs. That's the goal of the LispCast Clojure core.async video course. If you'd like to use core.async but you don't know where to start, this is a good place.
Julian Gamble: It was about 2007, I was a Java Programmer toying with Rails. I was reading Steve Yegge and his amazing but incendiary blog posts. These had a big theme on languages and expressiveness.
I got curious enough to play with Common Lisp and Racket. I eventually worked through The Little Lisper - and found it hugely inspiring. I also read SICP and Paradigms of Artificial Intelligence Programming and was amazed (coming from Java) at what was possible with functional programming and homoiconicity.
In 2008 I was looking for a way to combine the power of Lisp and my Java background. I had a look at ABCL and Kawa and found they didn't really have much of a community behind them.
Then the Rich Hickey videos started coming out, "Clojure for Java Programmers" etc. Then lots of Java/Lisp people started coming out of the woodwork. I was fascinated by the design choices made and how opinionated the language was. I was delighted by the community that seemed to have come from thin air.
In 2009 Stu Halloway's Programming Clojure book came out and I devoured it. Then I went back and worked through the Little Lisper chapters in Clojure - and found functional programming had a lot more to it than a first-year course for students in the 80s. This meant idiomatic functional programming didn't work like that anymore. (Although the idea of building a Lisp from a few primitives is still very elegant).
In 2011 a Clojure User group started up in Sydney run out of Thoughtworks. This was a fantastic chance to meet up with people who programmed and developed their skills just for the joy of it - rather than just to pay a mortgage.
In 2013 Dave Thomas started running LambdaJam in Australia, and it meant that the functional programming community because more social and more willing to look at each others ideas across languages. It also meant we developed a shared appreciation for the heritage of functional programming. We also developed a deeper appreciation of academic ideas and their linkage back to our practice.
So now in 2014 when Rich Hickey gives a talk on Transducers, and explains that folds are fundamental - I go and read the linked papers from Hutton and Bird. Then I can go and talk about this with my local community of Clojure people. This is something that I wouldn't have done two years ago.
Clojure has been a great ride!
LC: In your talk description, you mention that you will relate core.async to the concurrency semantics we already know. Can you explain this briefly?
JG: The assumption here is that most Clojure people have a Java or C# background. From this background they would have an understanding of how to put process on a thread, similar to how we might use a future in Clojure. Alternatively, Clojure people may have seen how we can pass a function to an agent in Clojure.
Now it is true that core.async go blocks have some very specific queue sleep/wait semantics (CSP) that make it different to other concurrency models. In addition, core.async in JavaScript runs as part of the single-thread model of JavaScript, so it isn't semantically a thread, more of a 'process'.
Given those assumptions, the primitives we find for doing concurrency in core.async are very similar to the concurrency models we have seen in the past in OO languages or in Clojure.
In this talk we'll take a look at an example of taking a well known graphical example in Clojure, port it to ClojureScript and core.async. What we find is that many of the Clojure concurrency paradigms have a natural analogue in ClojureScript and core.async.
LC: Are there any topics to read up on or resources to look at that could help a beginner make the most of your talk?
Or they can send me an email @ and ask me what I'm up to.
(I've also been known to present at clj-syd on meetup.com - but that is probably only of interest to Sydney readers).
LC: One last question: If Clojure could eat, what would be its favorite food?
JG: Well I'm a huge fan of Dan Friedman's work on The Little Lisper/Schemer - which has frequent food references, such as a peanut butter and jelly sandwich. The book will at random reserve an empty page and state:
THIS SPACE RESERVED FOR JELLY STAINS!
So the answer would have to be PBJ sandwich!
(But for all the British, European, Australian and New Zealand Readers - we all know it as peanut butter and strawberry jam - and some might be puzzled by the combination of these two condiments, but would still appreciate Friedman's book.)
LC: Thanks for a great interview!
This post is one of a series called Pre-conj Prep, which originally was published by email. It's all about getting ready for the upcoming Clojure/conj, organized by Cognitect. Conferences are ongoing conversations and explorations. Speakers discuss trends, best practices, and the future by drawing on the rich context built up in past conferences and other media.
That rich context is what Pre-conj Prep is about. I want to enhance everyone's experience at the conj by surfacing that context. With just a little homework, we can be better prepared to understand and enjoy the talks and the hallway conversations, as well as the beautiful venue and city of Washington, DC.
Clojure/conj is a conference organized and hosted by Cognitect. This information is in no way official. It is not sponsored by nor affiliated with Clojure/conj or Cognitect. It is simply me curating and organizing public information about the conference.
Julian Gamble's talk at the conj is about core.async in ClojureScript. core.async is an implementation of CSP for Clojure and ClojureScript. It allows for concurrency, and communication using an abstraction called channels. It is similar to built-in facilities in the Go programming language.
Background
core.async provides two main abstractions: go blocks and channels. Go blocks are lightweight processes that give the basic type of independent concurrency. To coordinate and communicate between the go blocks, the go blocks take values from and put values to channels.
Why it matters
core.async is important because it is a very powerful way to structure your code. Further, core.async is a library, not a core feature of the language, even though in many languages it is a fundamental part of the language. Finally, core.async gives Javascript (through ClojureScript) a much needed asynchrony model that is more expressive than callbacks.
This post is one of a series called Pre-conj Prep, which originally was published by email. It's all about getting ready for the upcoming Clojure/conj, organized by Cognitect. Conferences are ongoing conversations and explorations. Speakers discuss trends, best practices, and the future by drawing on the rich context built up in past conferences and other media.
That rich context is what Pre-conj Prep is about. I want to enhance everyone's experience at the conj by surfacing that context. With just a little homework, we can be better prepared to understand and enjoy the talks and the hallway conversations, as well as the beautiful venue and city of Washington, DC.
Clojure/conj is a conference organized and hosted by Cognitect. This information is in no way official. It is not sponsored by nor affiliated with Clojure/conj or Cognitect. It is simply me curating and organizing public information about the conference.
Summary: Token Bucket is a simple algorithm for rate limiting a resource. It's easy to understand because you can reason about it in terms of real-world objects. core.async makes this algorithm very clear and easy.
You know you've got a good abstraction when you find lots of algorithms that are easy and clear to write using them. One algorithm that I really like because it's practical and simple is called the Token Bucket. It's used for rate limiting a resource. And it's very easy to write in core.async.
Let me digress a little. Here in the US, we have a complicated voting system. Not only do your presidential votes count only indirectly, but the counting is done with machines that vary from state to state. Some of those systems are so complicated. It's a little scary because it's hard to know how those machines work.
I mean, relatively hard. It's obviously possible to know, if you study enough. But compare it to papers in a box. I know there's all sorts of room for corruption in the papers in a box system. But it's understandable by children. Well, as long as they can count.
The token bucket is similarly simple. There's a bucket. You put tokens in it at a certain rate (for instance, once per hour). If you have a token, you can take an action (ride the subway, send a packet, read a file), then you lose the token (it's good for one ride). If you don't have a token, you can wait. Everything slows down to the rate of tokens falling into the bucket.
It's so simple. It so easily corresponds to a real-world situation you can imagine. I love that type of algorithm.
An implementation
Let's build Token Bucket in core.async.
First, we need a bucket. For that, we'll use a core.async channel with a buffer. Let's just start with size 10.
(def bucket (chan 10))
Bucket is done. Now, we need something to add tokens to the bucket at a given rate.
That will add one token to the bucket every second.
We can rate limit an existing channel by forcing it to take a token before values get through.
(defn limit-rate [c]
(let [out (chan)]
(go
(loop []
(let [v (<! c)]
(if (nil? v) ;; c is closed
(close! out)
(do
(<! bucket) ;; wait for a token
(>! out v)
(recur))))))
out))
Corner cases
Ok, it's not that simple. There are two corner cases.
What happens when nobody takes a token out of the bucket? Do you keep putting coins in?
The answer is yes. In the next hour, there are two tokens, so two can come through. But then . . .
What do you do when the bucket gets really full and a whole bunch of people take tokens out at the same time?
Well, you let them all through. One token, one ride. There's no other coordination. And that means it's really important to choose the size of your bucket.
The number of tokens your bucket can hold is called the burstiness. It's because when the bucket is full, you could get a rampage of people trying to get on the subway. How many people should be allowed through at that point? The burstiness is the maximum that should come through at a time.
We have our two parameters: the rate and the burstiness. Let's incorporate all of that.
(defn limit-rate [c r b]
(let [bucket (chan b) ;; burstiness
out (chan)]
(go
(while true
(>! bucket :token)
(<! (timeout (int (/ 1000 r)))))) ;; rate
(go
(loop []
(let [v (<! c)]
(if (nil? v) ;; channel is closed
(close! out)
(do
(<! bucket) ;; wait for a token
(>! out v)
(recur))))))
out))
The burstiness is taken care of in the size of the buffer. The buffer will fill up if no one takes a token. Since we're using blocking buffers, putting tokens into a full bucket will block until something takes a token--exactly as the algorithm describes.
Well, that's it. It's easy. And now we have a way to limit the rate of a channel. We can use it to limit the rates of other things, too, like a function call or access to a database. I use it to rate limit an API.
core.async makes this algorithm nice and easy to use. There's a library that does this for you in a very convenient package. It's called Throttler. Bruno Vecchi has done the work of making this work well as a library. If you'd like to learn core.async, I recommend my LispCast Clojure core.async videos. It's a gentle and fun introduction to a great topic. You will learn everything you need to write Token Bucket and more!
Summary: There are a few conventions in core.async that are not hard to use once you've learned them. But learning them without help can be tedious. This article presents three guidelines that will get you through the learning curve.
Willy Wonka, inventor of CSP.
Introduction
The more you use core.async, the more you feel like Willy Wonka. He knew how to maximize the effectiveness of the Oomploompa. And while core.async comes with a lot of functions built in, he knew exactly which ones to use at which time.
In this extremely rare glimpse into the functioning of his mysterious factory, we take a look at the guidelines Wonka himself follows when orchestrating the work of the Oompaloompas.
When to use go versus thread?
Willy Wonka with his Thread Pool.
Background
Each Oompaloompa is a thread. Willy Wonka has a special group of Oompaloompas he calls a thread pool. Their assigment is simple: they manage a group of tasks that Wonka calls go blocks. Whenever Wonka has an appropriate task, he writes a go block and hands it to the Oompaloompas to work on.
As the Oompaloompas work, they take one task and do it until the task parks. When it parks, they put it down and pick up another task that isn't parked. Tasks become unparked when they get new input from the chocolate pipes. Then the Oompaloompas can continue working on them.
At one time, Wonka used to give the thread pool all sorts of tasks. He would give them very long calculation tasks, like weighing each chocolate bean in his chocolate bean mountain. He noticed that when they did this, lots of tasks were left undone, even though they were not parked, because all of the Ooompaloompas were busy doing something else.
So he came up with a guideline.
Avoid long calculations and blocking inside go blocks
Does your code do significant I/O, like downloading a file or writing to the network? Are you doing a very long calculation?
Then use a thread. If it will take a long time or block, you want a dedicated thread. It can work as long as it wants, and even block. That way it doesn't slow down the work of the thread pool.
Otherwise, you can use a go block.
When to use single- versus double-bang (!)
A couple of blocked Oompaloompas.
Background
Wonka also noticed that he needed to write different instructions for his two types of Oompaloompa. When he wrote a go block, he needed to say "park while you wait for input". But for the other Oompaloompas created with thread (or for his own work), he needed an instruction that said "block while you wait for input".
So he came up with a little notation convention. If you're just parking, so you're in a go block, use one bang. If you're outside of a go block, meaning you need to block, use two bangs.
These were his versions of his basic instructions:
>!, <!, and alts! versus >!!, <!!, and alts!!. The convention is easy.
Use single-bang versions in go blocks and double-bang versions outside.
The single-bang versions of these functions are meant to park a go block. Although they are defined as functions, they have special meaning to the go macro. In fact, if you actually run the functions (outside of a go block), they will throw an exception unconditionally, telling you they are meant to be inside a go block.
The double-bang versions are blocking. That means that the thread they are running on will block if the channel is not ready. They can be used outside of a go block (anywhere) or inside of a thread block. It's safe to block inside a thread block since it's a dedicated thread.
put!
Willy Wonka writing instructions for his mailman.
Background
Like all factories, Willy Wonka's needs deliveries. When the UPS truck comes, there's plenty of boxes to unload. But Wonka is busy. So he leaves a note outside for the delivery guy.
The note tells the guy where to put everything so the Oompaloompas know where to find it. When he says where to put a box, he spells it put!. That is, it has a bang.
It's unfortunate because the other functions with a bang mean they park. But put! does not park. Wonka was just angry one day, and the convention stuck.
But the delivery guy knows that Wonka is eccentric, so he doesn't take it personally and does his job. He puts stuff in its places, without blocking.
Use put! to get stuff into your channels from outside.
put! is a way to get values from outside of core.async into core.async without blocking. For instance, if you're using a callback-style, which is very common in Javascript, you will want to make your callback call put! to get the value onto a channel.
Conclusion
That's it! Now to eat some chocolate!
core.async is really cool, but it has a learning curve. Once you learn these conventions, you will begin to feel the power they give you, whether you're making chocolate or building cars. If you'd like to learn core.async and feel like Willy Wonka, I recommend the LispCast Clojure core.async videos. They build up a deep understanding of the fundamental concepts in a fun and gradual way.