March 13, 2015
Summary: A common failure in distributed systems is a server with a rate limit or with no limit but begins failing due to load. A standard solution is to retry after waiting a small time, increasing that time after each failure. We create a macro to handle this waiting and retrying.
A few days ago I wrote about a high-level way of handing intermittent errors, particularly in a distributed system. The way was simplistic: when you get an error, try again, up to a few errors. A slightly more nuanced approach is to back off before you try again. Each time there's an error, you wait longer, until some maximum time is reached.
The problem
Let's say you're hitting a service with a rate limit. That rate limit could be enforced or implicit. You've got lots of computers hitting it, and it's impossible to coordinate. No matter how hard you try to keep under that rate limit (and you should try), you will eventually break the limit. Retrying immediately when the server is too busy will actually make the problem worse. You will give it yet another request to deny. At the same time, it might be hard to distinguish "I'm too busy right now" from "I'm never going to recover".
The solution
I don't know what it's really called. I call it Exponential Backoff. It's also easy to turn into a separate routine:
(defn exponential-backoff [time rate max f]
(if (>= time max) ;; we're over budget, just call f
(f)
(try
(f)
(catch Throwable t
(Thread/sleep time)
(exponential-backoff f (* time rate) rate max)))))
This one has the same structure as try-n-times
but will sleep before recursing. When it recurses, the time is multiplied by the rate. And when the last wait is more than the max, it will try one more time. Failures from that last try will propagate.
How to use it
Same as with try-n-times
:
(exponential-backoff 1000 2 10000
#(http/get "http://rate-limited.com/resource"
{:socket-timeout 1000
:conn-timeout 1000}))
This will retry after waiting 1 second (1000 ms) the first time, then double it (the 2
) each time. When it waits 10 seconds, it won't retry any more.
Slightly more useful
Ok, so I don't use this exactly. What I use is slightly more complicated. I've found that I often can tell if it's a rate limiting problem if I look at the exception. So, let's pass it a predicate to check.
(defn exponential-backoff [time rate max p? f]
(if (>= time max) ;; we're over budget, just call f
(f)
(try
(f)
(catch Throwable t
(if (p? t)
(do
(Thread/sleep time)
(exponential-backoff f (* time rate) rate max))
(throw t))))))
This one only recurses if the predicate returns true on the exception. Let's service mentions "queue capacity" in the body of the HTTP response when it's too busy:
(exponential-backoff 1000 2 10000
(fn [t] ;; the predicate
(and (instance? clojure.lang.ExceptionInfo t)
(re-find #"queue capacity" (:error (ex-data t)))))
#(http/get "http://rate-limited.com/resource"
{:socket-timeout 1000
:conn-timeout 1000}))
You can be more selective about your backoff.
A Macro
Well, here's an example macro. It's got a bunch of defaults.
(defmacro try-backoff [[time rate max p?] & body]
`(exponential-backoff (or ~time 1000) ;; defaults!
(or ~rate 2)
(or ~max 10000)
(or ~p? (constantly true))
(fn [] ~@body)))
Here's how you use it:
(try-backoff []
(println "trying!")
(do-some-stuff))
Also, add it to your Clojure Emacs config for better formatting, because this one wants the args on the first line:
(put-clojure-indent 'try-backoff 1)
This tells Emacs to make the second argument ((println "trying!")
) one indentation in, instead of directly under the first ([]
).
Warning
All of the try3
warnings apply. The stuff you're doing inside needs to be idempotent!
Conclusion
This pattern is another cool, reusable component to help build reliability into a distributed system. Small, intermittent failures are pervasive. And a common form of error is a server being too busy. Being able to handle this type of error quickly and systematically is going to make your life easier.
Though Clojure does not have specific solutions to distributed systems problems, coding them up is short and straightforward. If you're interested in learning Clojure, I suggest you check out LispCast Introduction to Clojure. It's a video course that uses animation, storytelling, and exercises to install Clojure into your brain.
You might also like
October 08, 2013
A Lisp with a macro system is actually two languages in a stack. The bottom language is the macro-less target language (which I'll call the Lambda language). It includes everything that can be interpreted or compiled directly.
The Macro language is a superset of the Lambda language. It has its own semantics, which is that Macro language code is recursively expanded into code of the Lambda language.
Why isn't this obvious at first glance? My take on it is that because the syntax of both languages is the same and the output of the Macro language is Lambda language code (instead of machine code), it is easy to see the Macro language as a feature of the Lisp. Macros in Lisp are stored in the dynamic environment (in a way similar to functions), are compiled just like functions in the Lisp language (also written in the Macro language) which makes it even easier to confuse the layers. It seems like a phase in some greater language which is the amalgam of the two.
However, it is very useful to see these as two languages in a stack. For one, realizing that macroexpansion is an interpreter (called macroexpand
) means that we can apply all of our experience of programming language design to this language. What useful additions can be added? Also, it makes clear why macros typically are not first-class values in Lisps: they are not part of the Lambda language, which is the one in which values are defined.
The separation of these two languages reveals another subtlety: that the macro language is at once an interpreter and a compiler. The semantics of the Macro language are defined to always output Lambda language, whereas the Lambda language is defined as an interpreter (as in McCarthy's original Lisp paper) and the compiler is an optimization. We can say that the Macro language has translation semantics.
But what if we define a stack that only allows languages whose semantics are simply translation semantics? That is, at the bottom there is a language whose semantics define what machine code it translates to. We would never need to explicitly write a compiler for that language (it would be equivalent to the interpreter). This is what I am exploring now.
You might also like
March 05, 2015
Summary: Distributed systems fail in indistinguishable ways. Often, retrying is a good solution to intermittent errors. We create a retry macro to handle the retries in a generic way.
Let's face it: your system is probably a distributed system. All web apps are by definition distributed. They have at least one server, probably a separate database server, and many browser clients. And now microservices are getting popular. Distributed is the current and future normal. While Clojure solves the problems of multiple cores sharing memory at the language level, distributed systems problems are left to be addressed at the application level.
The problem
One big problem that comes up all the time in distributed systems is dealing with failure. Failure happens everywhere. The problem in a distributed system is that you don't know where the failure happened. For example, let's say you make an HTTP GET request and 20 seconds later, you're still waiting for the response. Is it:
- A network failure?
- Did the message not get to the server?
- Did the message get there, but the response didn't make it back?
- The server is down?
- The server is still working?
- The response is still coming?
- An intermediate computer (proxy) has filtered the request/response?
It is literally impossible to know what the problem is. And that's ok. There's a lot of machinery between one machine and the next. Even if you could diagnose the problem, are you really going to program each error case?
Let's say you call your friend and they don't pick up. Are they asleep? Is their phone off? Did the call not go through? The phone won't tell you. And you really want to talk to them. So what do you do? You call back. You might even call back a couple of times. If they pick up when you call back, great! If not, then you get tired and give up.
That's a common approach in distributed systems as well: retry your distributed message a few times before you give up. It's easy and fixes a surprising number of problems. What's more, there's a good solution that's simple in the Hickeyan sense.
The solution
Failure in Clojure typically means an Exception. So we'll need to catch exceptions and run code multiple times.
(defn try-n-times [f n]
(if (zero? n)
(f)
(try
(f)
(catch Throwable _
(try-n-times f (dec n))))))
You pass it a function and a number of times to retry it. The base case is when n
is 0. In that case, it will just try it (not retry). If it's greater than 0, it will wrap the function call in a try/catch
, catch everything, and recurse. If after n retries, is still throws an exception, try-n-times
will fail and some other code will have to deal with it. The concern of retrying is separated from what is being retried.
How do you use it?
Wrap your distributed calls in this bad boy and you're good to go.
Instead of this:
(http/get "http://somewhat-reliable.com/resource"
{:socket-timeout 1000
:conn-timeout 1000})
You do this:
(try-n-times #(http/get "http://somewhat-reliable.com/resource"
{:socket-timeout 1000
:conn-timeout 1000}) 2)
Remember, n
is the number of retries. So that's 1 try + 2 retries.
Macro, anyone?
Alright, yes, I made a macro for that. It does come in handy to have a macro that you can put code in instead of passing in a function.
(defmacro try3 [& body]
`(try-n-times (fn [] ~@body) 2))
This one is used like this:
(try3
(println "trying!")
(do-some-stuff))
Warning
Now, a little care needs to be taken when you use this. Remember, when you get a failure, it could be a timeout. The server could be processing your request. Or it could have failed halfway through a multi-step process. What that means practically is that your distributed message has to be idempotent. HTTP GET is idempotent, so it's ok. POST generally is not, but sometimes it is. Use your judgment! Also, you should make your call timeout, to turn long waits into errors.
Conclusion
This pattern is just one piece of a larger distributed system puzzle. The network and servers are unreliable. They might work the whole time during development, but in the fullness of time, an always-on distributed system will have some kind of failure eventually. Sometimes the failures are temporary, and in those cases, a quick retry can fix it right away.
Though Clojure does not have specific solutions to distributed systems problems, coding them up is short and straightforward. If you're interested in learning Clojure, I suggest you check out LispCast Introduction to Clojure. It's a video course that uses animation, storytelling, and exercises to install Clojure into your brain.
You might also like
March 07, 2014
Summary: Macros should be avoided to the extent possible. There are three circumstances where they are required.
There's a common theme in Lisp that you should only use macros when you need them. It is very common to see a new lisper overuse macros. I did it myself when I first learned Lisp. They are very powerful and make you the king of syntax.
Clojure macros do have their uses, but why should you avoid them if possible? The principle reason is that macros are not first-class in Clojure. You cannot access them at runtime. You cannot pass them as an argument to a function, nor do any of the other powerful stuff you've come to love from functional programming. In short, macros are not functional programming (though they can make use of it).
A function, on the other hand, is a first-class value, and so is available for awesome functional programming constructs. You should prefer functions to macros.
That said, macros are still useful because there are things macros can do that functions cannot. What are the powers of a macro that are unavailable to any other construct in Clojure? If you need any of these abilities, write a macro.
1. The code has to run at compile time
There are just some things that need to happen at compile time. I recently wrote a macro that returns the hash of the current git commit so that the hash can be embedded in the ClojureScript compilation. This needs to be done at compile time because the script will be run somewhere else, where it cannot get the commit hash. Another example is performing expensive calculations at compile time as an optimization.
Example:
(defmacro build-time []
(str (java.util.Date.)))
The build-time
macro returns a String representation of the time it is run.
Running code at compile time is not possible in anything other than macros.
2. You need access to unevaled arguments
Macros are useful for writing new, convenient syntactic constructs. And when we talk about syntax, we are typically talking about raw, unevaluated sexpressions.
Example:
(defmacro when
"Evaluates test. If logical true, evaluates body in an implicit do."
{:added "1.0"}
[test & body]
(list 'if test (cons 'do body)))
clojure.core/when
is a syntactic sugar macro which transforms into an if
with a do
for a then and no else. The body
should not be evaled before the test
is checked.
Getting access to the unevaluated arguments is available by quoting ('
or (quote ...)
), but that is often unacceptable for syntactic constructs. Macros are the only way to do that.
3. You need to emit inline code
Sometimes calling a function is unacceptable. That call is either too expensive or is otherwise not the behavior you want.
For instance, in Javascript in the browser, you can call console.log('msg')
to print out a message and the line number to the console. In ClojureScript, this becomes something like this: (.log js/console "msg")
. Not convenient at all. My first thought was to create a function.
(defn log [msg]
(.log js/console msg))
This worked alright for printing the message, but the line numbers were all pointing to the same line: the body of the function! console.log
records the line exactly where it is called, so it needs to be inline. I replaced it with a macro, which highlights its purpose as syntactic sugar.
Example:
(defmacro log [msg]
`(.log js/console ~msg))
The body replaces the call to log, so it is located where it is needed for the proper behavior.
If you need inline code, a macro is the only way.
Other considerations
Of course, any combination of these is also acceptable. And don't forget that although you might need a macro, macros are only available at compile time. So you should consider providing a function that does the same thing and then wrap it with a macro.
Conclusion
Macros are very powerful. Their power comes with a price: they are only available at compile time. Because of that, functions should be preferred to macros. The use of macros should be reserved for those special occasions when their power is needed.
You might also like