Dramatic Proof that Haskell is Science Fiction
Wednesday, August 6th, 2008Amazon’s “Better Together” pairing for O’Reilly’s upcoming Real World Haskell: Neal Stephenson’s upcoming Anathem. Coincidence? I think not.

Amazon’s “Better Together” pairing for O’Reilly’s upcoming Real World Haskell: Neal Stephenson’s upcoming Anathem. Coincidence? I think not.

So a coworker got me hooked on Project Euler today, and I’ve been spending far too much of my free time on it. However, it did lead to a very pleasing moment.
One of the problems deals with Collatz sequences: For any number, divide by two if the number is evenly divisible by two, multiply by three and add one otherwise. Stop when you reach the number 1. The task at hand was to find the number between 1 and 1,000,000 with the longest Collatz sequence.
My first attempt at counting this sequence, rather predictably, failed on large input:
collatzsz 1 = 1
collatzsz n = 1 + collatzsz nxt where
nxt = if 0 == mod n 2 then div n 2 else 3 * n + 1
Duh. Stack overflows everywhere. (Note: there’s probably a flag for GHC that will make this Just Work.) I figured that Haskell would probably go ahead and optimize tail calls for me, so I gave it a go:
collatzsz 1 accum = 1 + accum
collatzsz n accum = collatzsz nxt (1+accum) where
nxt = if 0 == mod n 2 then div n 2 else 3 * n + 1
Sure enough, it works for even the most obnoxiously large values of n.
Yes, I know that this is exactly what the language is supposed to do, but sometimes it seems like a small victory when a piece of software does even that. So thanks to the good folks who make GHC. You made me smile today.
I read an OnLamp article today called A Beautiful Regex Matcher… In Haskell. Interesting little piece of code, but there was a (silly little) bug in it. This struck me as a fine opportunity to figure out how HUnit — one of Haskell’s unit testing frameworks — works. Here’s what I came up with:
import Test.HUnit
import Regex
tests = TestList [ TestLabel (makeLabel x) (makeTest x) | x <- test_data ]
where makeTest tup@(reg, str, bool) =
TestCase $ assertEqual (makeLabel tup) bool (match reg str)
makeLabel (reg, str, bool) | True == bool = reg ++ " =~ " ++ str
| False == bool = reg ++ " !~ " ++ str
test_data = [ ("foo", "foo", True),
("foo", "bar", False),
("", "foo", True),
(".", "Foo", True),
(".*", "Foo", True),
("F.", "Foo", True),
("F.*", "Foo", True),
("Fo*", "Foo", True),
("Fo*d", "foo", False),
("Fo*d", "Foooood", True) ]
main = runTestTT tests
As mentioned in the article’s comments, the case where “.*” (“match anything”) is supposed to match “Foo” fails. Here’s the test output for the original code:
% runhaskell TestRegex.hs
### Error in: 4:.* =~ Foo
user error (HUnit:.* =~ Foo
expected: True
but got: False)
### Error in: 6:F.* =~ Foo
user error (HUnit:F.* =~ Foo
expected: True
but got: False)
Cases: 10 Tried: 10 Errors: 2 Failures: 0
Counts {cases = 10, tried = 10, errors = 2, failures = 0}
%
Easy enough. Finding out which cases fail makes tracking the problem down relatively straightforward. Turns out that the “matchstar” method doesn’t deal properly with periods at the end of the match. The original code:
matchstar c [] (y:ys) = c == y
Replace it with the following:
matchstar c [] (y:ys) = if c == '.' then True else c == y
And you get the following out of the unit tests:
% runhaskell TestRegex.hs
Cases: 10 Tried: 10 Errors: 0 Failures: 0
Counts {cases = 10, tried = 10, errors = 0, failures = 0}
%
Horray!
I read two very useful articles about Haskell recently. First is Haskell IO for Imperative Programmers which explains how to do simple IO. (It’s less obvious than you’d think, especially if you’re coming from one of the more popular languages like Java or Ruby.) The second is called JSON Parser in Haskell. It covers in a brief yet very informative way the exceptionally useful Parsec library that ships with many Haskell implementations.
Each of these articles are incredibly useful. You have to know how to get data in and out of your program, and there aren’t a lot of tasks that don’t require parsing data. The two techniques together, as you might expect, are crazy useful.
To summarize perhaps too quickly, you can do simple stdin/stdout IO using the “interact” function. Here’s the Haskell implementation of the popular “tac” utility:
main = interact $ unlines . (map reverse) . lines
The “lines” method turns a large string (stdin) in to a list of strings, “unlines” turns a list of strings in to one large string (stdout), “interact” ties the whole process together, and “(map reverse)” is where you would put your code.
In place of “lines” you can usefully drop Parsec in such that you can work directly on data structures instead of fussing with lines of text. (Haskell isn’t Perl, after all.) Here’s a quick example that reads pairs of integers:
module Main where
import Text.Printf
import Text.ParserCombinators.Parsec
atoi :: String -> Int
atoi = read
intsList :: Parser [(Int,Int)]
intsList = do
intsList' <|> (eof >> return [])
where intsList' = do
a <- many1 digit
many1 space
b <- many1 digit
newline
r <- ( many space >> (intsList' <|> return []))
return ((atoi a, atoi b) : r)
parseInput :: String -> [(Int,Int)]
parseInput s = case parse intsList "stdin" s of
Left err -> []
Right cs -> cs
rpt :: (Int,Int) -> String
rpt (a,b) = let c = maximum $ map ( length . cyc ) [a..b]
in printf "%d %d %d" a b c
where cyc n | 1 == n = [1]
| 1 == mod n 2 = n : cyc (1 + 3 * n)
| otherwise = n : cyc (div n 2)
main = interact $ unlines . (map rpt) . parseInput
And just like that you can move input parsing in to a parser where it belongs and not worry about it in the rest of your system. Certainly a lot more work than the same thing in, for instance, Ruby:
STDIN.each { |l| a,b = l.split.map {|x| x.to_i} ; puts whatever(a, b) }
However, dropping Parsec works the same way for arbitrarily complex data structures. This same code structure workspairs of integers, HTTP, JSON, whatever. Needless to say, I find this very pleasing.