Page 1 of 1

crawling filesystem

Posted: Thu Feb 12, 2009 4:49 pm
by mrouch
i'm trying to crawl a windows share, testing with one URL. in attempting to pick up just one url, which resolves from the browser, i get

Document not found: file:// document from file N:\BrandManagement\business.pdf: The system cannot find the path specified

for the url file:///N:/BrandManagement/business.pdf

i have checked File in Protocols

suggestions? thanks.

crawling filesystem

Posted: Thu Feb 12, 2009 5:51 pm
by mark
Your browser is using your local machine's mapping for the "N:" drive. The appliance doesn't know that mapping.

Go to Maintenance->Network filesystems and shares
to tell the appliance about your network file system(s)
and to get the proper base url.

crawling filesystem

Posted: Thu Feb 12, 2009 9:38 pm
by mrouch
i'm running Webinator 5.1.74-Windows-w/plugin and don't see Maintenance on the Admin menu. am i missing something?

crawling filesystem

Posted: Fri Feb 13, 2009 10:57 am
by John
You will still need to use the UNC path to the file to get it to crawl, e.g. file:\\SERVER\SHARE\BrandManagement ...

There may be permission issues depending on how the crawl is started. You may start the crawl from the command prompt, in which case it will run with your credentials, e.g.

texis profile=Profile dowalk/dispatch.txt

crawling filesystem

Posted: Fri Feb 13, 2009 11:30 am
by mark
file:///N:/BrandManagement/business.pdf is ok syntactically but the web server process may not have access to network resoruces like shared drives. Try a local drive to see if that's the issue. Running from the command prompt as John suggested should get around the webserver perm restrictions.

crawling filesystem

Posted: Fri Feb 13, 2009 11:55 am
by mrouch
perfect. that was it, the command prompt worked. thanks for the help.