View Full Version : An implementation question
Bender
07-03-2002, 02:27 AM
Hi, this is my first post here, though I've been lurking a lot, so be kind.
Anyway, I recently wrote a permutation generator, got to thinking about it, and realized that I could use it for all sorts of useful (maybe) pattern matching programs.
My mom does this game in the paper every day called "Jumble" in which letters are jumbled up, and you need to un-jumble them to find a word.
She would occassionally ask me for help, and I figured this would be a neat little application to write in Java (to practice and improve my skills).
What the program does is take the letters given it, and cycle through each permutation of the word. Each permutation is checked against a dictionary text file on the hard drive, and if it finds that word in there, it saves that word as a valid English word.
Anyway, the question is this... how can I improve it's speed. It's very inefficient now, checking each permutation against every word in the dictionary (or until a word matches)... so basically, every nonsensical permutation of letters is checked against all 25,000 odd words in this dictionary text file.
Is there a way to improve this program's efficiency? I thought about some sort of binary searching algorithm (a binary search tree maybe), but I figure that loading all 25,000 word Strings into memory isn't going to happen...
I hope this doesn't sound like a "Do work for me" post. It's really not. I'm just hoping someone has a direction they could point me in.
Thanks a lot.
Well, you could seperate your single dictionary file into 26 files based on letter, and only check the one that has the proper first letter; and since you're looking at the word in the dictionary, and you know what your permutation is, you should be able to tell when you've alphabetically passed the word, correct?
Just some thoughts.
darelf
07-03-2002, 11:13 AM
Here's a hint (I've worked on something like this in a class I had)...
Don't compare the whole word, compare one letter at a time. If your dictionary file is in alphabetical order, you will greatly reduce your search time......
You just have to keep track of where you are...
sicarius
07-04-2002, 01:17 AM
Well, this may sound like a funy solution, but I am sure it is pretty simple. Implement the anagram generator as per jem's post on this thread: http://www.coderforums.net/showthread.php?s=&threadid=636
Then just feed it the jumbled word as input. Since the hash key is the sorted list of characters in the string, the value that is returned will be a list of all the possible words that can be made from the jumbled set of letters.
As an example lets take the word pots. the jumbled version of pots might be spto. so our input to the program will be spto.
the program applies the hash function to spto and gets opst as the hash key. the program looks up the hash key, and gets a list of words that can be made from opst. the output might be: pots, spot, stop, tops, etc.
should be simply enough to implement.
Bender
07-04-2002, 01:24 AM
Originally posted by sicarius
Well, this may sound like a funy solution, but I am sure it is pretty simple. Implement the anagram generator as per jem's post on this thread: http://www.coderforums.net/showthread.php?s=&threadid=636
Then just feed it the jumbled word as input. Since the hash key is the sorted list of characters in the string, the value that is returned will be a list of all the possible words that can be made from the jumbled set of letters.
As an example lets take the word pots. the jumbled version of pots might be spto. so our input to the program will be spto.
the program applies the hash function to spto and gets opst as the hash key. the program looks up the hash key, and gets a list of words that can be made from opst. the output might be: pots, spot, stop, tops, etc.
should be simply enough to implement. THanks for the suggestion, but I already have the permutation generator set up. I can get all the different combinations simply enough.
It's the comparison to the dictionary file that's making the program inefficient.
Oh, and thanks to the other suggestions in this thread. I'm thinking about cycling through each word in the dictionary file, checking each word's first character to the permutations generator, and then moving to the right, comparing letters.
If I hit a point that I can no longer keep comparing, then it's not a word. Ugh... it's hard to picture this stuff in my head. :)
sicarius
07-05-2002, 05:56 AM
regardless of how fast you can search the dictionary for each permutation the anagram thingy would still be more efficient. Especially if you stored the resultant hash table in a serialized object. then you would be able to get your list of words in constant time.
To improve your lookup time, of course hashing is the fastest, being O(1), but other possible ways are:
Binary search (since a dictionary is ordered, right?) which is O(log(n))
Index it ... directories of A-Z, AA-AL and AM-AZetc. ?
You could also use a method of "bounding" combined with string searching, rather than searching for entire words
for instance if you have a permutation of "hellxyuei"
your algorithm would search each letter at a time, ie. "h" then "he" then "hel", "hell", and would stop at "hellx" since there is no word in the dictionary beginning with "hellx"
this on average would be faster because (without any calculations) most of your permutations will be nonsensical words.
ChefNinja
07-06-2002, 08:03 AM
By the by, what is O(1) ?
Sorry, its order 1, or "Big-Oh" 1
essentially when modelling algorithms its not useful to count the number of operations they do, since each operation is variable on different computers and operating systems and such, so you measure it as a function of the work done. (Er, this description sucks, its been a while since i learned why we use it :P)
as an example:
x = x + 1 is O(1) -- said "big-oh of one" or "order 1"
meaning that for every operation (only one) we do a constant amount of work
if we had
for (int i = 0; i < n; i++)
{}
where n is some abritrary integer greater than 0, we iterate through the loop N times, and thus the for loop is O(n), no matter what is in the for loop. So we do N work.
As an example, in a sequential search (1...n), we check each n elements once, so we have order 1 time. In binary search we halve our workspace until we find the right element, the maximum number of times we can divide something by 2 is log base 2(n), so binary search is O(log2(n)) etc.
This previous concept can be explained with the use of the polynomial ordering rule. The polynomial rule is as follows,
Any polynomial (for example 3x^2 + 2x + 2) is bound above by some constant multiplied by n to the power of the highest indice in the polynomial. ie our polynomial is O(n^2) (or n squared)
To show this is true, think of a graph, the graph of X^2 is bound above by X^3 for sufficiently large x. that is, for any value of x big enough, x cubed will always be bigger than x squared.
And, following from this, we can show that 3x^2 + 2x + 2 is always less than some integer multiplied by x^2
mathematically:
3x^2 + 2x + 2 <= 3x^2 + 2x^2 + 2x^2 (change all to same power x)
3x^2 + 2x + 2 <= 7x^2
<= M.(x^2)
where M is some sufficiently large integer.
Or maybe you should go look up some webpage for a better explanation than mine.
This stuff is also taught in most university computer science courses - so look for lecture notes on algorithm complexities, asymptotic bounds etc. at university websites or on google.
ChefNinja
07-06-2002, 02:51 PM
Thanks :)
fenris
07-11-2002, 11:02 AM
This may be a little late, but I did something like this for a game called word whomp @ www.pogo.com.
The game generated 6 random letters and you had to find all the valid words between 3 and 6 characters long. I tried generating all the permutations and comparing them to a dictionary. This was way to slow.
So what I did was to sort my dictionary so that a binary search would work on it. I also generated an index array containing all the possible permutations and stored it in a flat file(i.e. a flat file containing all the 3 letter permutaitons, then the 4 letter etc.). That way I only need to calculate the permutations once. When the program started, it loaded the text file into an array which significantly increased performance.
My reasoning for creating the file that contained all the permutations was why calculate something more then once when it will produce the exact same results every time.
Bender
07-20-2002, 05:55 AM
Well, I finally started working on this today, and finished up a pretty slick (and very fast) jumble solver using jemfinch's algorithm mentioned above.
You can look at the surprisingly short code here:
Source Code (in .txt format, should open in your browser) (http://excelsior.uccs.edu/j/jmorriso/JumbleGUI.txt)
The methods of interest are the makeHashtable() function and the solveButton_actionPerformed function.
If you want to play around with it, you can download it here:
Zip file containing jar, the dictionary.txt file, and the source. (http://excelsior.uccs.edu/j/jmorriso/Jumble.zip)
The file's somewhat large size (~131KB) is because of the included dictionary file, which holds over 20,000 words and is almost half a MB unzipped.
Note that the dictionary.txt file has to be in the same directory as the .jar file, because I'm lazy.
Tell me what you think.
jemfinch
07-20-2002, 03:06 PM
Originally posted by Bender
THanks for the suggestion, but I already have the permutation generator set up. I can get all the different combinations simply enough.
It's the comparison to the dictionary file that's making the program inefficient.
You can write the fastest permutation generator and dictionary lookup in world in the most efficient assembly language for the speediest processor available, but the method I posted in that thread will still, always be faster than your method for dictionaries greater than some size N because the algorithm you have is O(N) in the size of the dictionary, and the method I posted is O(1) in the size of the dictionary.
In short, if you want to do what you want to do quickly and efficiently, you're going to have to use the method I posted. "Optimizing" your method is really just a waste of time because it's a slow, inefficient algorithm. No amount of good coding will make slow algorithms fast.
Jeremy
jemfinch
07-20-2002, 03:09 PM
Originally posted by Bender
Well, I finally started working on this today, and finished up a pretty slick (and very fast) jumble solver using jemfinch's algorithm mentioned above.
Good :) Note, however, that it's not my algorithm, it's an algorithm I read about in Jon Bentley's Programming Pearls, an excellent book that anyone who codes for fun or profit should read.
You can look at the surprisingly short code here:
Source Code (in .txt format, should open in your browser) (http://excelsior.uccs.edu/j/jmorriso/JumbleGUI.txt)
The methods of interest are the makeHashtable() function and the solveButton_actionPerformed function.
If you want to play around with it, you can download it here:
Zip file containing jar, the dictionary.txt file, and the source. (http://excelsior.uccs.edu/j/jmorriso/Jumble.zip)
The file's somewhat large size (~131KB) is because of the included dictionary file, which holds over 20,000 words and is almost half a MB unzipped.
Strip out the dictionary and attach it to a post in the competition thread I posted to see implementations of this program in different languages :) And if you want a far bigger dictionary, I have a link to one much bigger in that thread, too.
Jeremy
vBulletin® v3.7.0, Copyright ©2000-2009, Jelsoft Enterprises Ltd.