Rewalk problems...

Ned23
Posts: 7
Joined: Thu Aug 30, 2001 4:08 pm

Rewalk problems...

Post by Ned23 »

I've recently tried to add .asp pages to be indexed by Webinator, but every time I rewalk the database, it gives me the following error:

rewalk Disallowed protocol, visted 0 pages total

It does, though, create a new database everytime, with the name _db, but since no pages were visited it never replaces the old database with the new one, so I have many extra databases which I have deleted.

Here are the commands and steps I went through to create the database and get to this point. First, I wiped the database using the gw -wipe -d- command. It appeared to work fine.

Then I created the database with the following command:
gw -fasp -d- -mOptions.txt http://www.mysite.com
The database was created allowing .asp pages and the pages and directories I excluded in the Options.txt file worked perfectly as did the search.

Then I tried to rewalk the site to ensure it would work correctly so I can use it as a scheduled task to update daily. To do this I used the following command:
gw -d- -rewalk
And, this is where the disallowed protocol error message comes into play.

Before I tried to index .asp pages and was just indexing .html pages, I was having no problems at all and had Webinator running on an NT scheduler as a batch job rewalking my site every night. But, now this is no longer working.

Any help you can provide would be greatly appreciated. Thanks.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Rewalk problems...

Post by mark »

What's in the Options.txt file? Anything besides x options?

What urls got remembered? Find out with:
gw -st "select Name,String from options where Name='URL'"

It may be illustrative to turn up verbosity on the rewalk:
gw -d- -v9 -rewalk
Ned23
Posts: 7
Joined: Thu Aug 30, 2001 4:08 pm

Rewalk problems...

Post by Ned23 »

In the Options.txt file, there are only x options listed in the following format:

xhttp://www.mysite.com/jump

And, when I check and see what url's got remembered, it only lists the main site, www.mysite.com, but that is the only site I'm indexing so this is probably correct.

But, now for the really weird part, when I tried to rewalk again, for the third time, it seemed to work fine and gave me no errors. So I guess, I'll just wait until tomorrow morning and make sure that my scheduled batch file works correctly.

I don't think I did anything differently this time from the first couple of times I tried this, but now it seems to work so thanks for all your help.