Rewalk Type Refresh: Removing pages not linked

Post Reply
andrea.schneider
Posts: 21
Joined: Fri Dec 04, 2009 8:38 am

Rewalk Type Refresh: Removing pages not linked

Post by andrea.schneider »

Hi
We have used "Rewalk Type Refresh" to index our website. Once a page was referenced in a list and was perfectly found by Webinator. After a while the user has removed the page from the list, actually there was no more reference on the website to this page. The page itself is still accessible if someone knows its URL.
If I do a search I can still find the page as a result and I can access it. If I'm looking for parents, I get an empty result - because it's not linked/referenced anywhere.

Is there a setting which gives me the option that such pages will be removed from Webinators index with the usage of "Rewalk Type Refresh" or is this only possible with "Rewalk Type New"?

Thanks
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Rewalk Type Refresh: Removing pages not linked

Post by mark »

Refresh will not delete such pages as long as they exist on the server whether they are linked or not. You have to do a new walk to clear them automatically. Or you can use Profile Tools -> List/Edit URLs to delete them manually.
andrea.schneider
Posts: 21
Joined: Fri Dec 04, 2009 8:38 am

Rewalk Type Refresh: Removing pages not linked

Post by andrea.schneider »

thanks mark...
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Rewalk Type Refresh: Removing pages not linked

Post by John »

Since you have Webinator you could change the script add a delete at the end of the walk where Pop = 0, or in storepage/pageunchanged delete pages that have 0 links.
John Turnbull
Thunderstone Software
andrea.schneider
Posts: 21
Joined: Fri Dec 04, 2009 8:38 am

Rewalk Type Refresh: Removing pages not linked

Post by andrea.schneider »

Sounds a little tricky, I'm not sure where the exact postion would be to add such a routine.

Are you having an example? I'm using this script
Webinator 5.1.87
$Id: dowalk.src,v 2.732 2010/02/02 17:25:14 jason Exp $
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Rewalk Type Refresh: Removing pages not linked

Post by mark »

Right before
<if $SSc_removecommon eq Y>
Removing commonality from fetched pages...<flush><removecommon>

</if>
you could add
<sql row "delete from html where Pop=0">
<sql novars "delete from refs where Url=$Url"></sql>
</sql>
Deleted $loop unreferenced pages.
Post Reply