PDA

View Full Version : Grabbing HTML code from the Web


calvarer
07-03-2003, 03:59 PM
Hi!
I'm just running into a problem. I want to copy and paste html code from another page on the web. Since the code of that page is always changing, I wanna see if there is a way to do it dynamically.
I'm open to all suggestions. Thanks
:D

NoahK1ng
07-04-2003, 01:01 AM
Very vague answer:
Yes, it should be possible to dynamically parse the html from another web site via asp.

Questions that would lead to a better answer:

1) What exactly is it you want to do? More information makes responses better.

2) What do you need from the site? Does the data you need follow a specific pattern?

3) Can you simply contact the site admin and ask him to generate an XML document that you can parse into your site?

4) U.R.L. of your site?

calvarer
07-07-2003, 12:26 PM
What I want to do is to get some HTML code from a sports page (specifically, their headlines). I want to have a copy of their headlines with links to their complete articles. I know they keep an SQL database with the information, and they update it in the database periodically.
Maybe I could contact the webmasters of that site and ask them for some form of restricted access to their database, but I wanted to see if it is possible to do it myself in my ASP page without bothering them.
They code does follow a specific pattern, it is always in between the span tag linked to a specific CSS class (that CSS tag is not used anywhere else in the page). I wonder if my script can get their page, parse it in some way and get the text in that specific span tag.
If possible, I would greatly appreciate if someone could post an example on how to access an external webpage, parse and then include that in the displayed contents of my ASP page. Also, if someone can think of a better idea on how to do that, I am open to any suggestions (I am still sort of a newbie VBScript programmer).:D

gish
07-07-2003, 02:38 PM
well...no webmaster, IT manager would ever allow any type of access (from outside sources) to their DB for any reason. You simply need to parse the page. That is the only way to get the information.

NoahK1ng
07-10-2003, 07:11 PM
Well, first off, to access a remote webpage via asp... here is a link to get you started.: ASP 101 - ASP HTTP Request (http://www.asp101.com/samples/viewasp.asp?file=http%2Easp)

That will load the html returned into a single string, and from there its just a matter of extracting all the <span class="whatever">This Is The Text I want </span> tags.

Next (assuming javascript, though I'm sure that vbscript has roughly equivilent methods)

var headlines = new array() //to hold the headlines
var counter = 0 //how many headlines are there?
var allHeadlinesFound = 0 //have we found all the headlines in the string?
var stringStart = 0 //start position of current headline
var stringEnd = 0 //end position of current headline
var currentPosition = 0 //current position in the string

while allHeadlinesFound = 0 {
currentPosition = SportsPage.indexof('<span class="whatever">', currentPosition)
//starting at current position, find the position of the next occurance of our span tag.

if currentPosition = -1 then allHeadlinesGound = 1
else {
stringStart = currentPosition + 23 //+23 puts us right after the span tag (it's 23 chars long)
currentPosition = SportsPage.indexof('</span>', currentPosition)
stringEnd = currentPosition
headlines[counter] = SportsPage.substr(stringStart, StringEnd)
counter = counter + 1
}


//write out the headlines to the page:

for (a = 0; a < headlines.length; a++) {
document.write(headlines[a])
document.write("
")
}

... now I haven't tested this, so it may be extremely buggy, but it's a concept that should work.
I hope this helps!