Geek Rant

All talk and no action

Archive for June 2008

Some Haskell Misconceptions: Idiomatic Code, Purity, Laziness, and IO

with 5 comments

Laziness vs. strictness is a commonly misunderstood part of Haskell, as is how the IO monad is evaluated. I hope to clear some things up here.

One of the main features of Haskell is call-by-need evaluation. The idea is to avoid unnecessary computation by suspending evaluation until the result is needed. A suspended computation is called a thunk. When a thunk is evaluated, it is replaced in memory with its value, which may itself refer to more thunks. For example, say we have a Stream type.

data Stream a = Cons !a (Stream a)

Notice that this is always going to be an infinite structure. If we were to evaluate a stream strictly, we would run out of memory. Also notice the strictness annotation on the first component of the structure. It’s not required, but it makes the evaluation strategy explicit and may happen to also help GHC better optimize things in some cases. I am using strictness annotations not to make structures lazy or strict, but merely to indicate how the structures are inherently lazy or strict. Whenever I evaluate a Stream, its head will be evaluated immediately, but its tail will be suspended. This means we can use a Stream in a real program without running out of memory, which turns out to be very useful.

Contrast this with a Vector type.

data Vector a = Vector !a !a !a

Notice the strictness annotations. I placed them there because there are no circumstances where I would want to evaluate a Vector without evaluating its components. If I evaluate Vector 1 2 3 + Vector 4 5 6, then I want Vector 5 7 9, not Vector (1+4) (2+5) (3+6). Imagine what would happen if, using vectors with lazy components, I added an infinite a huge number of vectors together and only then evaluated the result’s components. I would be building three thunks which each store an infinite a huge number of values to add together for no reason, causing the computer to run out of memory. This has the same effect as strictly evaluating a stream.

Are you paying attention? Strictly evaluating a lazy structure and lazily evaluating a strict structure does the exact same thing! With strict evaluation only, we run out of memory on huge structures like streams. With lazy evaluation only, we run out of memory on huge computations like loops.

This parallel indicates pretty clearly that recursively operating on each element of a stream is an infinite loop. Stream is a control structure. It doesn’t exist to persist data across many parts of a program. It exists to feed data into a function one element at a time. In contrast, a Vector exists to persist an intermediate or final result of some 3-dimensional computation. In short, Stream is for controlling computation, and Vector is for storing data. This generalizes to lazy and strict types, respectively.

This can come as quite a shock to those who mistakenly believe that laziness is a way to save memory and computation at the expense of predictability. There may occasionally be positive performance side-effects, but that is not the most important and common use for laziness. Further, lazy evaluation is not unpredictable. It is just a model of computation, another tool for your toolbox, and if you understand it you will not be surprised by it. Haskell is not really a lazy language as much as it is a lazy-by-default language. It is simple and idiomatic to use strictness where your program calls for it. In other words, contrary to many complaints, strictness is not a low-level optimization technique. Strictness is a strategy for algorithms. As is always the case, if you change an algorithm, its time and space requirements will change, but it’s a high-level change. A strictness annotation or data structure redesign is not like dropping to C or assembly.

[Edit: Everything after this point, I would really like to just delete completely. I don't think it supports my points as well as I originally thought, and it's also just sloppy. I'm leaving it so you readers can at least see where I was going.]

The other important point I want to go over here is the way the Haskell runtime causes side-effects. [Edit: From this point on, take the things I say with a little salt. I am stretching things a bit to make a point without burdening the article with tedious details. It's practically true though.] A common misconception is that the IO monad represents impure code, that it is a hack to get around an inherent limitation in an otherwise pure language. The misconception, I believe, comes from a misunderstanding of what an IO function actually does. Many believe that IO is strictly evaluated, artificially, in order to preserve the ordering of effects, as it would be in an imperative language. In reality, the evaluation of IO is no different than of any other monad. The laziness or strictness of a monad’s evaluation depends only on the underlying data structure.

Before I delve too deeply into IO, I’d like to go on a brief tangent back to streams. I already gave one possible way to define a Stream data type above. There are other ways as well. For example, consider this one, loosely based on the definition used in the stream-fusion package:

data Stream a = forall s. Stream !(s -> (s, a)) !s

The first component of the structure is a step function which takes some state and returns a new state and a value, and the second is an initial state. This version is just as lazy as the one I gave earlier. The laziness, in this case, is captured by the step function. The next state and value are not computed until the step function is applied to the initial (or some other) state.

Back to IO. Here is a simplified version of the type of IO’s innards:

type IO a = RealWorld -> (RealWorld, a)

Notice the similarity to the definition of Stream I just mentioned. Here, the state parameter is RealWorld, and we don’t hold an initial state since it will vary from execution to execution. (RealWorld is not an actual value, of course. In this way, the IO monad may be considered a hack, but it’s a clean hack, and it works with the standard evaluation model rather than against it.) This is a lazy structure. Based on the definition, this is the expanded type of the bind function for the IO monad:

(>>=) :: (RealWorld -> (RealWorld, a)) -> (a -> RealWorld -> (RealWorld, b)) -> RealWorld -> (RealWorld, b)

And a direct function definition would look something like this:

(>>=) m k rw = let (rw', x) = m rw in k x rw'

So we pass in the RealWorld to the first action, then we pass its results to the second action. Pretty simple stuff, really.

Now to prove that it is lazy. Let’s say I have actions a and b, which I sequence with a >>= \_ -> b. What happens when we execute this? What we are looking for is whether b will be evaluated before the effect of a is performed. If it is, then we are evaluating strictly because we have to evaluate the whole action before it is useful. If it is not, then we are evaluating lazily because we are delaying computation until it is needed.

If we expand the operation a bit, we get \rw -> let (rw', x) = a rw in (\_ -> b) x rw', which we can simplify a bit to get \rw -> let (rw', _) = a rw in b rw'. Clearly, in order to evaluate b, we have to already have evaluated a. [Edit: augustss pointed out in the comments that this is not generally true unless b always uses its RealWorld parameter, so we have to assume that it does in this explanation. This is fine. If it doesn't use the parameter, then it doesn't have an effect anyway.] The only way to evaluate a, obtaining its return value and the new RealWorld state, is to perform its side-effect. Evaluation of the IO monad is indeed lazy. Also, the evaluation of the IO monad is, itself, what causes the side-effects.

Here’s the kicker: this evaluation model means that IO functions are pure. If you use the same IO function in two parts of a program, you may indeed get different return values, but that’s simply because they are parameterized on the RealWorld, which will be different every time. Referential transparency means that a function will always return the same value for the same arguments, but we will never pass the exact same argument to an IO function, so IO functions are, trivially, referentially transparent.

This brings me to my final point. There do exist some “lazy IO” functions (e.g., readFile, hGetContents, etc.), so called because they use unsafeInterleaveIO to return a lazy value before the effect that generates that value has been performed, treating the effects as the result of evaluating the value. In other words, these generate lazy effects. These values can perform effects in otherwise pure code. It is as if they have a hidden global reference to the current state of the real world which they refer to for evaluation; since this RealWorld value may be different each time the value is evaluated, we lose some predictability in pure functions. It is unfortunate that tutorials for beginners seem to encourage use of these functions because it appears to be a common source of misunderstanding and misuse.

So there you have it. Haskell is lazy by default, with optional strictness. Strictness is part of an algorithm, not a low-level optimization detail. IO actions are evaluated lazily. There is no impure hack in place for side-effects in normal IO actions. IO actions are pure. So-called “lazy IO” is not pure complicates things significantly. I hope this cleared some things up.

Written by Jake

June 23, 2008 at 9:03 pm

Posted in Haskell, Programming