PDA

View Full Version : regex sucks...


inkedmn
11-25-2002, 01:50 PM
:)

ok, i'm doing a thesaurus thing, here's my code:


import urllib2, re

class Thesaurus:

def __init__(self, word):
self.url = "http://thesaurus.reference.com/search?q="
self.word = word

def lookup(self):
html = urllib2.urlopen(self.url + self.word).read()
match = re.search(".*Synonyms:[/b]</td><td>(?P<synonyms>.*?) <b", html, re.DOTALL)
synonym_list = match.group("synonyms").split(",")
print match.group("synonyms")
print synonym_list


the problem is, it's only returning 4 matching words when i know there are more. for example, here's the output for "bottle":

C:\brett>python thesaurus.py
alembic, bag, beaker, bin,
['alembic', ' bag', ' beaker', ' bin', '']


eh?

[edit]
vBulletin seems to have processed part of the html i my regex, but i don't think it will make a difference

kmj
11-25-2002, 04:04 PM
Um, dude; look at the source for that page... after bin comes bottle which is bold because it's the search-word, so your regex ends there.