Why are links being followed?

Post Reply
online2008
Posts: 4
Joined: Fri Apr 18, 2008 3:20 pm

Why are links being followed?

Post by online2008 »

I have a text file with under 600 urls. I put the url of this file in the Page URL field and set the Max Depth to 0.

When I check the walk status, it shows 19,999 pages in the index. Should it not be just about 600 pages to be indexed?

More to the point, the search results are turning up irrelevant information. How do I limit the search to just those 600 pages?

I had also tried putting the 600 urls in the Single Page field, and had the same results (19,999 pages in the index).
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Why are links being followed?

Post by mark »

What do you have in Base URL? Extra domains?

Is rewalk mode set to new or refresh? You probably want new.
online2008
Posts: 4
Joined: Fri Apr 18, 2008 3:20 pm

Why are links being followed?

Post by online2008 »

Hello Mark,

The Base URL is the same as the server on which Webinator was uploaded. Nothing is in Extra domains.

I've changed the rewalk mode to new to see what different results this will produce.

--C
online2008
Posts: 4
Joined: Fri Apr 18, 2008 3:20 pm

Why are links being followed?

Post by online2008 »

Hello Mark,

We've made the following changes and now have the following setup:

--the Base URL, Watch URL, and Page URL are all the same, ie the text file with the list of URLs to be searched

--Max Depth is 0

--nothing is in Extra domains

--Rewalk is New


Happily, Webinator is now working exactly as required, in that it searches and returns results from only the pages whose urls we've listed in the text file.

Now that the old, 19,9999-page walk has been purged, we'll change the Rewalk to Refresh.

Is there anything in this setup that will cause problems down the road?


--C
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Why are links being followed?

Post by mark »

Base URL and Page URL should not be the same. They mean different things. In your case probably just use Base URL with max depth 0 and no Page URL.

Watch URL is only meaningful if you've set rewalk schedule to on change.
online2008
Posts: 4
Joined: Fri Apr 18, 2008 3:20 pm

Why are links being followed?

Post by online2008 »

Okay.

Thank you very much Mark, for sterling support.

--C
Post Reply