Indexing a directory

michaelbarton · Post by **michaelbarton** » Mon May 23, 2005 3:30 pm

I have a directory of files (word, PDF, ...) There are no html files. I can view the contents of the directory through a browser, but I can't get Webinator to index the directory. Is there a setting in the tool that allows it to spider the contents of a directory? Is there a specific format to the baseurl field?

Post by **mark** » Mon May 23, 2005 3:45 pm

The base url should be the same as you use in your browser (make sure you have the std required trailing / on the url). A directory listing will look like any other html page to the browser or web indexer. Check the walk status to see what it said about that page. Did it get that page or not? If not, what was the error? If so check list/edit urls for that url to see what content and children were found on that page. Try setting verbosity to 4 and doing a new (not refresh) walk to get more info about discarded urls.

Make sure your robots.txt and meta robots values or walk settings allow indexing of that page.

michaelbarton · Post by **michaelbarton** » Mon May 23, 2005 3:50 pm

I get a timeout message:
Timeout sending data to xxxx.xxxx.xxxx.com:80

Post by **mark** » Mon May 23, 2005 5:19 pm

That sounds like a connectivity issue. Is the appliance on the same network as the workstation that's able to reach the server? Does the server or any firewall between it and the appliance allow the appliance the same kind of access as your workstation?

Or if the server's just slow to respond, increase the page time under all walk settings.