View Full Version : problem parsing some html...
inkedmn
04-24-2002, 12:08 AM
ok, here's what i'm doing:
i'm connecting to a url and reading some cgi-generated html into memory. i'm looking for screenname, if they have AIM/ICQ, and their handle (if applicable).
here's where i'm having the problem. if one of these two lines appears anywhere in the script, the handle will be exactly 5 lines below:
AIM ID:
ICQ ID:
so i need to be able to tell the code to grab the line that's 5 lines below either of those lines...
here's what i have so far:
import urllib, htmllib
lnourl = "http://www.linuxnewbie.org/cgi-bin/ubbcgi/ultimatebb.cgi?ubb=get_profile&u="
getsource = urllib.urlopen(lnourl + "00002769")
html = getsource.read()
lines = html.split('\n')
for line in lines:
words = line.split(' ')
if words[0] == "Profile":
user = words[2]
if line == "AIM ID:"
help! :)
Benny
04-24-2002, 02:21 AM
Couldn't you do something like ......
import urllib, htmllib
lnourl = "http://www.linuxnewbie.org/cgi-bin/ubbcgi/ultimatebb.cgi?ubb=get_profile&u="
getsource = urllib.urlopen(lnourl + "00002769")
html = getsource.read()
lines = html.split('\n')
x = -1
for line in lines:
x = x+1
words = line.split(' ')
if words[0] == "Profile":
user = words[2]
if line == "AIM ID:":
aimID = lines[x+5]
Wouldn't that get you the 5th line below? This is untested but it should work in theory....:D
inkedmn
04-24-2002, 06:35 PM
no, i tried that too :)
thanks for the suggestion though
Benny
04-24-2002, 11:14 PM
Hmmm I just tried it and it seemed to work...
I did this:
import urllib, htmllib
lnourl = "http://www.linuxnewbie.org/cgi-bin/ubbcgi/ultimatebb.cgi?ubb=get_profile&u="
getsource = urllib.urlopen(lnourl + "00002769")
html = getsource.read()
lines = html.split('\n')
x = -1
for line in lines:
x = x+1
words = line.split(' ')
if words[0] == "Profile":
user = words[2]
if line == "ICQ ID:":
print lines[x+5]
and it printed back the icq number ......... ??
Am I missing something?
inkedmn
04-24-2002, 11:19 PM
well my goodness... that does work!
thanks very much for the help Benny!!
one question though: why initialize x as -1 just to add one to it two lines down?
Strike
04-24-2002, 11:24 PM
He could have just started it at 0 and then added the one on at the end, or he could have done it this way. It's really a matter of preference - no real "best practice"
Benny
04-24-2002, 11:27 PM
I just do it like that out of habit, because the first item in a list is '0'
So if I start 'x' at '0' and then in my 'for' loop add one to 'x', 'x' will then refer to the first item as '1' ..... rather then '0'..
But yeh, like strike said I could have just had the "x=x+1" bit at the end...... you can do it either way, I just like to do it like that.....:)
Happy to help out.
Yeah; the thing to note is that you have to add one to it each time through the for loop, and obviously it has to be initialized to the appropriate value before you do that.
(look at me putting in my useless two cents :) )
inkedmn
04-25-2002, 12:23 PM
ok, here's my code at this point. it appears to be working, but it's not recording anything in the text file like i want it to. i'm sure it's something to do with the if > else statments near the end...
import urllib, htmllib, sys
def getInfo(number):
number = str(number)
lnourl = "http://www.linuxnewbie.org/cgi-bin/ubbcgi/ultimatebb.cgi?ubb=get_profile&u="
num_zeros = 8 - len(str(number))
zeros = "0" * num_zeros
lnourl += zeros + number
try:
getsource = urllib.urlopen(lnourl)
except Exception, message:
print message
sys.exit()
html = getsource.read()
lines = html.split('\n')
x = -1
dbfile = open('lno.txt', 'a')
for line in lines:
words = line.split(' ')
if words[0] == "Profile":
user = words[2]
print "found user: " + user
if line == "ICQ ID:":
icq = lines[x+5]
else:
icq = "Not Listed"
if line == "AIM ID:":
aim = lines[x+5]
else:
aim = "Not Listed"
if icq != "Not Listed" and aim != "Not Listed":
print "adding to file"
dbfile.write("Username: "+ user + "; ICQ: " + icq + "; AIM: "+aim)
dbfile.close()
total_users = 10
for number in range(total_users):
getInfo(number)
any ideas?
Strike
04-25-2002, 06:53 PM
Well, you probably want that to be "or" and not "and" because as long as either one is something, you want it.
inkedmn
04-25-2002, 08:50 PM
no, if they both = "Not Listed", i don't want it to append the name to the file. if one or none is there, i want it to append the info.
Strike is saying the same thing that I said to you via IM.
Having 'and' only allows writing to the file if both exist.
Having 'or' will allow writing to the file if one, the other, or both exist, which is what you want.
vBulletin® v3.7.0, Copyright ©2000-2009, Jelsoft Enterprises Ltd.