fetch returns cut-off html

nduvnjak · Post by **nduvnjak** » Sat Dec 06, 2008 2:27 pm

Hi,
I'm trying to crawl this link:
http://bbs.my0511.com/forumdisplay.php?fid=282

but the <fetch> in vortex script returns the html which is obviously cut-off. It happens king of randomly, 1 out of 5 times it will return the complete html, but mostly it's cut-off.

I tried setting the various user-agents. Is there some other <urlcp> setting I should look into?

Thnaks a lot.
Nenad

Post by **mark** » Mon Dec 08, 2008 11:12 am

Perhaps maxpgsize or timeout. See http://www.thunderstone.com/site/vortex ... imits.html

Are you getting any error messages in your output or in vortex.log? If not, have you suppressed or written your own putmsg?

nduvnjak · Post by **nduvnjak** » Mon Dec 08, 2008 1:46 pm

must have been the timeout kick-in, because that description corresponds to what actually happened - partial html returned.

I don't think any error messages were returned because my program would terminate in that case (it only ignores CHARSET errors, on any other it would stop).

Btw, it looks like the website is working much better now, never returns cut-off html in today's testing, so I can't replicate the error from the other day. What was confusing me then, was that I was always able to get the whole page and see the complete source HTML in Browser, but the Vortex <fetch> was failing most of the time. So I thought I was missing some urlcp setting.

thank you for your answer.