fetch and timouts

Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

fetch and timouts

Post by Fippy »

I have <urlcp timeout 120>
and am trying to fetch a 1000 urls in a <fetch>
My fetch always exits after about 80 or so, so I used urlinfo to print a whole host of diagnostics, like each url fetched, the http code, error code, time taken etc. Each took 1-9 seconds to fetch.

Question: Is the urlcp timeout for ALL url's or each one? The docs say it is per each fetch but totalling up the time taken to fetch all my urls it looked like the timeout of for ALL fetches, i.e. the whole fetch loop.

If one url times out does it quit the entire fetch loop?
Will it enter the loop with the timed out url and set the http and error codes accordingly or will it just skip the loop for that url?

What I am seeing does not make sense if urlcp timeout is per each url. With 1000 urls it ought to take the script 1000*120 seconds to quit, right?

Thanks to anyone who can clear up the confusion.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

fetch and timouts

Post by John »

How many are you fetching in parallel? Are you doing a lot of processing in the fetch loop? Is it the overall script timeout you are hitting, or exiting the fetch loop?
John Turnbull
Thunderstone Software
Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

fetch and timouts

Post by Fippy »

Parallel is 2, and I'm not doing a great deal in the loop and there are no exits.
It really looks as if the timeout is for ALL urls not each.
I just upped the timeout value and its reading more records now, but it doesnt make sense to me that the timeout has to be so high for a web url
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

fetch and timouts

Post by mark »

After how long does it actually timeout?
What's the exact message you're getting?
What's your overall script <timeout> set to?
Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

fetch and timouts

Post by Fippy »

<urlcp maxpagesize -1>
<urlcp reparentmode abs>
<urlcp timeout 1500>
<vxcp timeout 11000>

<fetch parallel=2 $adlinks>

It seems to run for about 20 minutes (which is a little short of 1500 seconds). I don't get any messages or errors or anything. The fetch loop just exits and I have only processed say 200 of the 1000 url's in $adlinks.

As I said before, if I print urlinfo diagnostics, I never get a report of a timed out url. I can print all the urls that trigger another trip around the fetch loop and I get say 200 of them, never the 1000.

Most odd. All I know is the more I increase <urlcp timeout> the more records I get to receive.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

fetch and timouts

Post by mark »

Do you have a putmsg function trapping messages?
Are there any related messages in vortex.log?
How are you executing this script? Are you running it from a command line or accessing it via a browser? If by a browser the browser may be timing out the connection due to no data for too long. Or the webserver may be timing it out for running too long.
Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

fetch and timouts

Post by Fippy »

I don't see any relevant errors in the log nor anything regarding a putmsg in the script.
It is being executed by a command line on a linux machine with plenty of free memory and cpu.
I really am seeing a direct correlation between the size of the urlcp timeout and the number of urls I get to fetch, just as if the timeout was for the entire fetch, not for each url.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

fetch and timouts

Post by mark »

Can you give your "texis -version" output and the shortest complete example script that demonstrates the behavior?
Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

fetch and timouts

Post by Fippy »

Texis Web Script (Vortex) Copyright (c) 1996-2004 Thunderstone - EPI, Inc.
Commercial Version 5.00.1086121238 20040601 (i686-unknown-linux2.4.9-64-32)

Creating a test script may take significant effort unfortunately. There are several modules involved. I'll see what I can come up with.
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

fetch and timouts

Post by Kai »

If you count the URLs before the opening <fetch> (with <count $adlinks> $ret), and then print $loop after the closing </fetch>, is $loop also short of the number of URLs? Are you calling <fetch> or <submit> inside the <fetch> loop?
Post Reply