Page 1 of 1

don't crawl untitled pages

Posted: Mon Sep 19, 2005 1:50 pm
by KMandalia
I am happy to throw out pages that don't have any title. How do I do that in dowalk?

The thing here is that I only want to throw away webpages but keep PDF and Doc files.

I guess I have to also delete all the untitled pages from the database and re-index it for this to take effect. Am I right?

don't crawl untitled pages

Posted: Mon Sep 19, 2005 2:05 pm
by John
The simplest method currently would be after the <metafromfield> call add

<if $title eq ''><$exfield_index eq "N"></if>

which will tell it not to index the page as if you add "Exclude by Field". You would either need to delete all currently untitled pages, or rewalk the database to remove any existing ones.