PDA

View Full Version : anagram generator


Strike
06-28-2002, 04:05 PM
The discussion started - http://www.coderforums.net/showthread.php?s=&threadid=633 - there, but I didn't want to hijack the thread, and since GnuVince (rightly) suggested that I post my proposed algorithm here, I thought it'd best go in this forum.

Anyway, the algorithm assumes (not unreasonably) that you have a word file to use as a benchmark of "what is a word?" for your anagrams. Anyway, the basic algorithm goes like this:

(this is assuming you work on an in-memory copy of the file, so if you are dealing with the actual file, make a copy of it at the beginning and use that copy for this)
1. You get the word/phrase to be made into other words
2. Step through each word in the word file (sequentially), doing the following:
3. If the length of the word is longer than the phrase, delete it from the word list (copy) and go to the next word (step 2)
4. If the first letter isn't even in the phrase, delete the word and go to the next (back to step 2)
5. Otherwise, iterate over the word, popping letters off a copy of the phrase string until you get one that doesn't match.
6. If you iterate over the whole word, keep it. Otherwise, delete it.
7. Go to the next word (back to step 2)
8. When all the words in the file are done, you will have a list of words that contain letters in your phrase. From here, I'll leave the algorithm up to you. It definitely shrinks the search space though. :)

Strike
06-28-2002, 04:37 PM
Okay, here's the rest (my proposal, anyway):

Each time, you will want to be working with a copy of the list, but you loop through more than once, so be careful about that.

For the first word in the list, simply remove all the letters common to this word and the input phrase. Then delete this word, and add it onto an empty list (which will contain the anagram words). Obviously, if the remaining phrase is "" we are done and can iterate again.

Then, iterate over each word in the list, comparing lengths with what you have left. If the word is longer than what you have left, delete it.
Continuall iterate over the list until: a) you match all the letters, leaving "" in your phrase, or b) the remaining letter list doesn't change after iterating over each word. If we hit a, we have a new list of words that is an anagram. If we hit b, the current starting word sucks :)

Then, do the same thing for each starting word.

Okay, so maybe this isn't the most efficient algorithm :)

jemfinch
06-30-2002, 06:32 PM
Oh, duh. I can't believe I didn't remember this in the other thread.

Anyway, you take your dictionary of "real words" and and empty hash table of strings to string lists. Then you take each word in the dictionary, sort the characters, and add the word (unsorted) to the hashtable using the sorted word as the key.

Then you just iterate over the hashtable, printing all the lists of strings, each list on its own line. Every word on a line will be an anagram of the rest of the words on the line.

Or you can keep the hashtable around (perhaps serialized) to look up all the anagrams of a given word -- to do so, just sort the characters in the word and lookup that key in the hashtable.

This is straight from "Programming Pearls," by Jon Bentley. I knew it from a long time ago, but forgot to mention it in that thread.

Jeremy

Strike
07-04-2002, 03:01 AM
Ooooooh, very cool. Truly O(1) ;) I like it.

jemfinch
07-15-2002, 10:40 AM
Originally posted by Strike
Ooooooh, very cool. Truly O(1) ;) I like it.

It's still O(n) in the size of the dictionary, of course.

Jeremy

Strike
07-15-2002, 02:44 PM
Right, I meant assuming the same dictionary for all inputs.