adding exclusions in refresh

Post Reply
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

adding exclusions in refresh

Post by KMandalia »

I understand that refresh will only refresh what is already in the database BUT if I add some exclusions and exclude REX after I pause the walk, would it stop refreshing what is already in the database and would it stop bringing similar urls to what is in the existing database? I missed one of the sorttypes to exclude when crawling a website and now that mistake is costing me some 18,000 pages and the refresh walk keeps bringing more pages even after I added the exclusions.

Also, how can I add url patterns that are not directory structures but are query patterns

instead of http://www.somesite.com/somedir/*

I want to add, http://www.somesite.com?somequery?somevar=*

If I have multiple categories and if you can suggest some way to accomplish the query pattern, then what happens to urls that match none of the categories???
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

adding exclusions in refresh

Post by mark »

You should use list/edit urls to find and delete urls you don't want in the database. Exclusion rules mainly apply to newly discovered urls.

Your example pattern should work. The entire literal url is considered for category matching.

Urls with no category will still be returned when searching in "everything".
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

adding exclusions in refresh

Post by KMandalia »

That's the problem. If something like,

http://www.somesite.com/somepage.asp?q1=.... is already in database and if I exclude ,

somepage.asp\?q1\=
and even \somepage.asp,

and do a refresh walk, it should no longer bring the pages with this pattern but it looks like it does. List/Edit Urls is always an option, but I just wanted to know if I am doing something wrong.
Post Reply