Sang-drax
12-25-2003, 01:11 PM
I'm a Haskell newbie and I'm trying to learn the language.
I've written a K-approximate string matching function in Haskell.
Information about the algorithm can be found at:
http://www.csam.iit.edu/~cs535/dymamic_p3.pdf
--K-approximate string matching
--
--takes two strings and return the minimum number
--of changes needed to make the strings equal
--http://www.csam.iit.edu/~cs535/dymamic_p3.pdf
kCompare :: String -> String -> Int
--Boundary conditions
kCompare [] s = length s
kCompare s [] = length s
--Recursive case
kCompare (x:xs) (y:ys) = min3 ((kCompare xs (y:ys)) + 1) --Remove one character from x
((kCompare (x:xs) ys) + 1) --Remove one character from y
((kCompare xs ys) + equal) --Change a character (if needed)
where
equal = if x == y then 0
else 1
min3 a b c = min a (min b c)
But the performance is extremly bad. When used with strings of 17 characters, the function doesn't finish in 20 minutes, whereas the C++ version finishes in one second using 1000 character strings.
The time complexity of the function is O(nm) where n and m are the length of the two strings.
The Haskell function works perfectly for strings less than 10 characters.
I really like the syntax of Haskell, the readability of the above code is very high and the code is brief, but the performance makes the function rather useless.
I've written a K-approximate string matching function in Haskell.
Information about the algorithm can be found at:
http://www.csam.iit.edu/~cs535/dymamic_p3.pdf
--K-approximate string matching
--
--takes two strings and return the minimum number
--of changes needed to make the strings equal
--http://www.csam.iit.edu/~cs535/dymamic_p3.pdf
kCompare :: String -> String -> Int
--Boundary conditions
kCompare [] s = length s
kCompare s [] = length s
--Recursive case
kCompare (x:xs) (y:ys) = min3 ((kCompare xs (y:ys)) + 1) --Remove one character from x
((kCompare (x:xs) ys) + 1) --Remove one character from y
((kCompare xs ys) + equal) --Change a character (if needed)
where
equal = if x == y then 0
else 1
min3 a b c = min a (min b c)
But the performance is extremly bad. When used with strings of 17 characters, the function doesn't finish in 20 minutes, whereas the C++ version finishes in one second using 1000 character strings.
The time complexity of the function is O(nm) where n and m are the length of the two strings.
The Haskell function works perfectly for strings less than 10 characters.
I really like the syntax of Haskell, the readability of the above code is very high and the code is brief, but the performance makes the function rather useless.