Problem with -j option

Post Reply
neil.munro
Posts: 22
Joined: Fri Nov 09, 2001 1:13 am

Problem with -j option

Post by neil.munro »

Hi I have a problem restricting the webinator to keep only the urls that begin with a certain string.
OK them manual clearly says that -j is the answer. However, for me, it does not work. I am not sure where the problem lies but when a user goes to our website, they get allocated an anonymous sessionid that gets plugged into the URL just after the domain name... However I don't think this is the problem, but I thought that I might tell you about it just in case.. Ok so an example:
I want to only be concerned with URLs that begin with:
http://myReallyGreatWebSite.com/it/

so when I run Webinator from the command line, I have a little file full of options. One of which is:
jhttp://myReallyGreatWebSite.com/it/
(ie without the dash for the j option)
I have allowed the .htm extension explicitly with fhtm (though I don't think it is required)
There are no other options that either restrict or enable in the cfg file, just things about verbosity, breadth first etc

So when I run it:
gw -dmydatabase -mmyConfigfile http://myReallyGreatWebSite.com/it/index.htm

When it is run, I get the following output:
Adding todo: http://myReallyGreatWebSite.com/it/index.htm
Saving options and URLs to lastrun
http://myReallyGreatWebSite.com/it/index.htm
0: TotLinks: 0, Links: 0/ 0, Good: 0, New: 0 Retrieving
0: TotLinks: 0, Links: 0/ 0, Good: 0, New: 0 Disallowed path(i)
0: TotLinks: 0, Links: 0/ 0, Good: 0, New: 0 Disallowed MIME type
0: TotLinks: 0, Links: 0/ 0, Good: 0, New: 0

So I get a disallowed path as well as disallowed MIME type (though the http header says that content-type is text/html..yeah sure it isn't quite the same as MIME type)

Have you ANY idea of what I or the program or the system is doing wrong ????

I am running GW Version 2.56 (Commercial) on SunSolaris 8.

Neil.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Problem with -j option

Post by mark »

The server redirect with sessionid is the problem. When the top page is fetched the webserver redirects to
http://myReallyGreatWebSite.com/RANDOMSESSIONID/it
with a mime(content) type of magnus-internal/webengine which are both disallowed by your settings.

Visit the site once by hand to get a sessionid and use that for your gw -j option and starting url.
Post Reply