Excluding dirs...

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Excluding dirs...

Post by Thunderstone »




Hi,
How to exclude directories from being indexed. I mean I want a specific directory to be indexed. Web server has the permission to access the directory. But all the sub-DIrectories are getting indexed. Please note that I want to index all the html files under these subdirectories too......When i execute the search I get the following result..

What could be the problem.....?
I made a search for "Asthma"....There are a lot of articles under Healthtopics which belong to Asthma.

Any help would be greatly appreciated.

Thanx in advance
Sumant




1: File Index
http://www.PersonalMD.com/healthtopics/ ... dits.shtml http://www.PersonalMD.com/healthtopics/AIDS.shtml http://www.PersonalMD.com/healthtopics/ ... prai.shtml http://www.PersonalMD.com/healthtopics/ ... erio.shtml http://www.PersonalMD.com/healthtopics/art/asthma.sht ...
http://www.personalmd.com/... ealthtopics/healthtopics1.shtml 89%
Size: 5K
Depth: 1
[Find Similar]
Match Info

--------------------------------------------------------------------------

2: Index of /healthtopics/
Index of /healthtopics/ Name Last Modified Size Up to higher level directory Wed Aug 02 18:31:11 PDT 2000 0 bytes AIDS.htm Mon Jul 17 22:17:09 PDT 2000 13 kb AIDS.shtml Mon Jul 17 22:17:10 PDT 2000 17 kb Arthritis.htm Mon Jul 17 22:17:10 PDT 2000 5 kb Arthri ...
http://www.personalmd.com/healthtopics/ 88%
Size: 3K
Depth: 6
[Find Similar]
Match Info

--------------------------------------------------------------------------

3: Index of /healthtopics/experts/
Index of /healthtopics/experts/ Name Last Modified Size Up to higher level directory Wed Aug 02 18:30:44 PDT 2000 0 bytes AIDS.html Mon Jul 17 22:23:19 PDT 2000 13 kb AIDS.shtml Mon Jul 17 22:23:20 PDT 2000 17 kb Arthritis.html Mon Jul 17 22:23:20 PDT 2000 5 ...
http://www.personalmd.com/healthtopics/experts/ 88%
Size: 2K
Depth: 7
[Find Similar]
Match Info

--------------------------------------------------------------------------

4: Index of /healthtopics/art/
Index of /healthtopics/art/ Name Last Modified Size Up to higher level directory Wed Aug 02 18:30:19 PDT 2000 0 bytes CVS/ Wed Aug 02 18:30:18 PDT 2000 0 bytes anksprai.gif Mon Jul 17 22:17:41 PDT 2000 48 kb anksprai.htm Mon Jul 17 22:17:41 PDT 2000 1 kb ank ...
http://www.personalmd.com/healthtopics/art/ 88%
Size: 5K
Depth: 7
[Find Similar]
Match Info

--------------------------------------------------------------------------

5: File Index
http://www.PersonalMD.com/healthtopics/ ... sspr.shtml http://www.PersonalMD.com/healthtopics/crs/strnos.shtml http://www.PersonalMD.com/healthtopics/crs/stye.shtml http://www.PersonalMD.com/healthtopics/crs/subabu.shtml http://www.PersonalMD.com/healthtopics/crs/subabuse ...
http://www.personalmd.com/... ealthtopics/healthtopics6.shtml 86%
Size: 5K
Depth: 1
[Find Similar]
Match Info

--------------------------------------------------------------------------

6: Index of /healthtopics/crs/
Index of /healthtopics/crs/ Name Last Modified Size Up to higher level directory Wed Aug 02 18:30:43 PDT 2000 0 bytes CVS/ Wed Aug 02 18:30:26 PDT 2000 0 bytes aaa.htm Mon Jul 17 22:18:34 PDT 2000 6 kb aaa.shtml Mon Jul 17 22:18:34 PDT 2000 9 kb aadiarrh.htm ...
http://www.personalmd.com/healthtopics/crs/





User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Excluding dirs...

Post by Thunderstone »



So you want everything indexed except the pages that just list
a bunch of other pages. You'll need to delete the listing pages
after the crawl is complete.
See http://www.thunderstone.com/gw25man/node101.html
Something like:
gw -s "delete from html where Title = 'File Index'"
gw -s "delete from html where Title matches 'Index of /'"
gw -index




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Excluding dirs...

Post by Thunderstone »



Hi:

I need exclude some pages of my database or no make index of this pages, all
page to erase are similar to:

http://www.lacompu.com/xxxxxxxx?template=imprimir
All pages finish whith: ?template=imprimir

Thanks.

Christian Ruggeri
LaCompu.com

----- Original Message -----
From: Mark <mark@thunderstone.com>
To: <cristian@depot.com.ar>
Sent: Friday, August 04, 2000 2:09 PM
Subject: Re: Excluding dirs...


specific directory to be indexed. Web server has the permission to access
the directory. But all the sub-DIrectories are getting indexed. Please note
that I want to index all the html files > under these subdirectories
too......When i execute the search I get the following result..
Healthtopics which belong to Asthma.
http://www.PersonalMD.com/healthtopics/AIDS.shtml
http://www.PersonalMD.com/healthtopics/ ... prai.shtml
http://www.PersonalMD.com/healthtopics/ ... erio.shtml
http://www.PersonalMD.com/h> ealthtopics/art/asthma.sht ...
directory Wed Aug 02 18:31:11 PDT 2000 0 bytes AIDS.htm Mon Jul 17 22:17:09
PDT 2000 13 kb AIDS.shtml Mon Jul 17 22:17:10 PDT 2000 17 kb Arthritis.htm
Mon Jul 17 22:17:10 PDT 2000 5 > kb Arthri ...
higher level directory Wed Aug 02 18:30:44 PDT 2000 0 bytes AIDS.html Mon
Jul 17 22:23:19 PDT 2000 13 kb AIDS.shtml Mon Jul 17 22:23:20 PDT 2000 17 kb
Arthritis.html Mon Jul 17 22:23:20 P> DT 2000 5 ...
level directory Wed Aug 02 18:30:19 PDT 2000 0 bytes CVS/ Wed Aug 02
18:30:18 PDT 2000 0 bytes anksprai.gif Mon Jul 17 22:17:41 PDT 2000 48 kb
anksprai.htm Mon Jul 17 22:17:41 PDT 2000> 1 kb ank ...
http://www.PersonalMD.com/healthtopics/crs/strnos.shtml
http://www.PersonalMD.com/healthtopics/crs/stye.shtml
http://www.PersonalMD.com/healthtopics/crs/subabu.shtml
http://www.PersonalMD.com> /healthtopics/crs/subabuse ...
level directory Wed Aug 02 18:30:43 PDT 2000 0 bytes CVS/ Wed Aug 02
18:30:26 PDT 2000 0 bytes aaa.htm Mon Jul 17 22:18:34 PDT 2000 6 kb
aaa.shtml Mon Jul 17 22:18:34 PDT 2000 9 kb aad> iarrh.htm ...



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Excluding dirs...

Post by Thunderstone »




gw -s "delete from html where Title matches '%?template=imprimir'"

See http://www.thunderstone.com/texisman/node61.html for how to use matches.

Also, that other prefix match mentioned below should have been:
gw -s "delete from html where Title matches 'Index of /%'"




Post Reply