fetch returns cut-off html

nduvnjak
Posts: 40
Joined: Wed Feb 06, 2008 3:45 pm

fetch returns cut-off html

Post by nduvnjak »

Hi,
I'm trying to crawl this link:
http://bbs.my0511.com/forumdisplay.php?fid=282

but the <fetch> in vortex script returns the html which is obviously cut-off. It happens king of randomly, 1 out of 5 times it will return the complete html, but mostly it's cut-off.

I tried setting the various user-agents. Is there some other <urlcp> setting I should look into?

Thnaks a lot.
Nenad
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

fetch returns cut-off html

Post by mark »

nduvnjak
Posts: 40
Joined: Wed Feb 06, 2008 3:45 pm

fetch returns cut-off html

Post by nduvnjak »

must have been the timeout kick-in, because that description corresponds to what actually happened - partial html returned.

I don't think any error messages were returned because my program would terminate in that case (it only ignores CHARSET errors, on any other it would stop).

Btw, it looks like the website is working much better now, never returns cut-off html in today's testing, so I can't replicate the error from the other day. What was confusing me then, was that I was always able to get the whole page and see the complete source HTML in Browser, but the Vortex <fetch> was failing most of the time. So I thought I was missing some urlcp setting.

thank you for your answer.