PDA

View Full Version : Performance problem in Haskell


Sang-drax
12-25-2003, 01:11 PM
I'm a Haskell newbie and I'm trying to learn the language.

I've written a K-approximate string matching function in Haskell.
Information about the algorithm can be found at:
http://www.csam.iit.edu/~cs535/dymamic_p3.pdf


--K-approximate string matching
--
--takes two strings and return the minimum number
--of changes needed to make the strings equal
--http://www.csam.iit.edu/~cs535/dymamic_p3.pdf

kCompare :: String -> String -> Int

--Boundary conditions
kCompare [] s = length s
kCompare s [] = length s
--Recursive case
kCompare (x:xs) (y:ys) = min3 ((kCompare xs (y:ys)) + 1) --Remove one character from x
((kCompare (x:xs) ys) + 1) --Remove one character from y
((kCompare xs ys) + equal) --Change a character (if needed)
where
equal = if x == y then 0
else 1
min3 a b c = min a (min b c)

But the performance is extremly bad. When used with strings of 17 characters, the function doesn't finish in 20 minutes, whereas the C++ version finishes in one second using 1000 character strings.
The time complexity of the function is O(nm) where n and m are the length of the two strings.
The Haskell function works perfectly for strings less than 10 characters.

I really like the syntax of Haskell, the readability of the above code is very high and the code is brief, but the performance makes the function rather useless.

slidewinder
12-28-2003, 08:24 PM
Could you post the c++ function that you are comparing this to? That Haskell looks to me like it's exponential according to the length of the shorter string. I.e., O(3^17) in your example. Each call to kCompare creates three new, unshared, calls to kCompare. Each of those three creates three more, etc., until one of the strings is empty.

It's not tail recursive, either, so the space usage will be similarly explosive.

It's true Haskell will almost always (or maybe just "always") be slower than c or c++, but it should be more like the difference between, say c++ and perl, than the difference between c++ and "impossible".

I'm just a hobbyist, with no formal background in this stuff, so if I'm wrong about the complexity of the algorithm you posted, I'd appreciate a brief explanation as to how.

Sang-drax
12-29-2003, 02:36 AM
You are correct. The C++ function implements the algorithm as it was intended to be implemented (with a matrix and for-loops).
I never realized that my attemt at Haskell changed the complexity until now. :(

jemfinch
12-31-2003, 07:09 AM
Given the behavior you experienced, your first guess should've been that the Haskell version got the complexity wrong.

As a note, you'll also (more commonly and accurately) hear this algorithm referred to as "Levenshtein distance."

Jeremy