Duplicate pages

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Duplicate pages

Post by Thunderstone »



Dear colleagues,

I am using Windows NT 4.0 Server as OS on my computer. One of the
differences between NT and UNIX is file naming convention. UNIX differs
capital letters from non-capital. NT does not. (NT keeps only
information letter case).

In result the pointer http:/www.mysite/My_Folder/document.html gives the
same result as http:/www.mysite/my_folder/DOCUMENT.HTML. Webinator does
not think so. Search results contains many pointers to the same page. Is
it possible to get rid of these dangle pointer?

Thank you for Your time,

Karlis



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Duplicate pages

Post by Thunderstone »




Unfortunately, NT web servers don't seem to follow the HTTP spec that
says the path part of a Url should be case sensitive. You'll need to
use the -unique option to prevent these duplicates (or fix the links on
your web server to be consistent).

You need to specify -unique on an empty database, then walk the site.
See http://www.thunderstone.com/gw2man/node20.html .


Post Reply