View Full Version : problem with this sorting thing...
inkedmn
08-31-2002, 10:46 PM
ok, i've been working on that anagram thing for the long-since-dead competition thread...
here's my code so far (yes, it's incomplete)
#anagram getter
import string
def sortWord(word):
wordList = list(word)
wordList.sort()
sortedWord = string.join(wordList, '')
return sortedWord.strip().lower()
file = open('dictionary.txt', 'r')
words = file.read()
wordlist = words.split('\n')
sortList = []
for word in wordlist:
if len(word) > 0:
sorted = sortWord(word)
sortList.append(sorted)
sortList.sort()
print sortList
print 'done'
now, here's a snippet of the output:
'zep', 'zesu', 'zghllmopuyy', done
now, in a perfect world, each word's letters would be sorted alphabetically, added to this list, then the list sorted. but, as you can see, these words aren't sorted (like "zep" should be "epz", etc.)
can somebody tell me why my code isn't doing this right?
thanks
inkedmn
08-31-2002, 11:59 PM
nevermind, Vince rules :)
i wasn't lowercase-ing the words before alphabetizing it's letters. SOOO, uppercase Z's are higher in the list than lowercase a's...
here's the working sort code:
#anagram getter
import string, sys
def sortWord(word):
wordList = list(word)
wordList.sort()
sortedWord = string.join(wordList, '')
return sortedWord
file = open('dictionary.txt', 'r')
words = file.read()
wordlist = words.split('\n')
sortList = []
for word in wordlist:
if len(word) > 0:
sortList.append(sortWord(word.strip().lower()))
sortList.sort()
print sortList
print 'done'
inkedmn
09-01-2002, 12:46 AM
ok, i started an anagram class...
just posting this because i'm odd :)
import string, sys
class Anagram:
def __init__(self):
self.sortList = []
self.dict = {}
def sortWord(self, word):
wordList = list(word.strip().lower())
wordList.sort()
sortedWord = string.join(wordList, '')
return sortedWord
def makeList(self):
file = open('dictionary.txt', 'r')
words = file.read()
wordlist = words.split('\n')
print "Sorting words..."
for word in wordlist:
if len(word) > 0:
self.sortList.append(self.sortWord(word))
self.sortList.sort()
def createDict(self):
print "Creating dictionary..."
print "Adding dictionary entries..."
for word in self.sortedList:
self.dict[word] = []
print "Dictionary creation complete."
def populateDict(self, file):
print "Populating dictionary values with words from", file
sourceFile = open(file, 'r')
data = sourceFile.read()
words = data.split(' ')
for word in words:
sorted = self.sortWord(word)
if self.dict.has_key(sorted):
self.dict[sorted].append(word)
it's not done, but it works so far (i think) :)
supa cool inked man. :)
<unrequested advice>
I'm not sure I'd go about it in the way you do, though. :) You seem to have to read the dictionary file in twice.. that seems rather inefficient. On top of that, you read the whole file into memory at once.. that has the potential to greatly increase the footprint of your program. On reason you have to read the file twice is that your breaking the code up into seperate functions; this often makes sense, but sometimes it becomes counterproductive. Like ChefNinja said on IRC yesterday, sometimes it doesn't make sense to use a class. While I think a class is okay in this case, I do think you're breaking the code up a little too much. Consider the problem: go through a list of words and determine if they are an anagram of a given word. Okay, what do we have to do? Basically, open a file, and for each word, determine if it is made up of the exact same letters as the given word.. if so, remember it. That's really all you have to do. With that in mind:
def sortWord(word):
wlist = list(word.strip().lower()).sort()
return string.join(wlist, '')
def anagram(dictioary_file, word):
""" return a list of words in 'dictionary_file' that are an anagram for 'word' """
sInWord = sortWord(word)
infile = file(dictionary_file, 'r')
line = infile.readline().rstrip()
matchList = []
while line != '':
if sInWord == sortWord(line):
matchList.append(line)
line = infile.readline().rstrip()
return matchList
My point is that this would be much more efficient. If you wanted to encapsulate these two functions in a class (AnagramFinder) or something, that'd be fine; but don't modularize your program to the point where it's causing inefficiency, ya know?
</unwanted advice>
inkedmn
09-01-2002, 03:08 AM
actually, the first "read the dict in" is for the dictionary file (that creates all the keys in the dict), the second is for adding another file or files that would create the values...
and, as i said, work in progress :)
jemfinch
09-02-2002, 06:53 PM
I think this problem is best solved by having an AnagramDictionary that has a way to add anagrams, delete anagrams, and look up anagrams (__setitem__, __delitem__, and __getitem__, respectively.) For efficiency's sake, this should be persistent (that is, you should be able to serialize the state to disk so you don't have to read your whole dictionary of words every time you want to look up and anagram.)
Since I haven't gotten around to writing this competition in Python (even though I'm the one who submitted it!) I'll write my AnagramDictionary using my cdb.py module later this evening.
And I'll link it into my bot :D
Jeremy
inkedmn
09-02-2002, 07:23 PM
/me is still learning all this "design" stuff ;)
jemfinch
09-03-2002, 12:49 AM
Upon further consideration (well, a few seconds' more thought and actual implementation) __getitem__, __setitem__, and __delitem__ aren't the proper methods to get/set/delete anagrams. So I just wrote .add, .remove, and .find methods. It's all in my plugins/anagrams.py module.
The anagram database, btw, is 11mb large, uncompressed. It's inspired me to move data/ to a different package :)
Jeremy
recluse
09-03-2002, 12:29 PM
just posting this because i'm odd
Yes, this is true.
Originally posted by recluse
Yes, this is true.
recluse! This is how it starts... a couple posts in the Python forum, commenting on a member's weirdness... then asking a question about the code... next thing ya know you write a line or two yourself.. then you're hooked. :)
inkedmn
09-03-2002, 02:22 PM
LOL, and perl is a distant memory... ;)
vBulletin® v3.7.0, Copyright ©2000-2009, Jelsoft Enterprises Ltd.