I have <urlcp timeout 120>
and am trying to fetch a 1000 urls in a <fetch>
My fetch always exits after about 80 or so, so I used urlinfo to print a whole host of diagnostics, like each url fetched, the http code, error code, time taken etc. Each took 1-9 seconds to fetch.
Question: Is the urlcp timeout for ALL url's or each one? The docs say it is per each fetch but totalling up the time taken to fetch all my urls it looked like the timeout of for ALL fetches, i.e. the whole fetch loop.
If one url times out does it quit the entire fetch loop?
Will it enter the loop with the timed out url and set the http and error codes accordingly or will it just skip the loop for that url?
What I am seeing does not make sense if urlcp timeout is per each url. With 1000 urls it ought to take the script 1000*120 seconds to quit, right?
How many are you fetching in parallel? Are you doing a lot of processing in the fetch loop? Is it the overall script timeout you are hitting, or exiting the fetch loop?
Parallel is 2, and I'm not doing a great deal in the loop and there are no exits.
It really looks as if the timeout is for ALL urls not each.
I just upped the timeout value and its reading more records now, but it doesnt make sense to me that the timeout has to be so high for a web url
It seems to run for about 20 minutes (which is a little short of 1500 seconds). I don't get any messages or errors or anything. The fetch loop just exits and I have only processed say 200 of the 1000 url's in $adlinks.
As I said before, if I print urlinfo diagnostics, I never get a report of a timed out url. I can print all the urls that trigger another trip around the fetch loop and I get say 200 of them, never the 1000.
Most odd. All I know is the more I increase <urlcp timeout> the more records I get to receive.
Do you have a putmsg function trapping messages?
Are there any related messages in vortex.log?
How are you executing this script? Are you running it from a command line or accessing it via a browser? If by a browser the browser may be timing out the connection due to no data for too long. Or the webserver may be timing it out for running too long.
I don't see any relevant errors in the log nor anything regarding a putmsg in the script.
It is being executed by a command line on a linux machine with plenty of free memory and cpu.
I really am seeing a direct correlation between the size of the urlcp timeout and the number of urls I get to fetch, just as if the timeout was for the entire fetch, not for each url.
If you count the URLs before the opening <fetch> (with <count $adlinks> $ret), and then print $loop after the closing </fetch>, is $loop also short of the number of URLs? Are you calling <fetch> or <submit> inside the <fetch> loop?