extensions

Post Reply
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

extensions

Post by KMandalia »

does case matter? Do we have to specify .HTM and .htm suspecting some of the websites may use .HTM?

What is the effect of ignore case?

Is .pdf same as .PDF? Because it doesn't seem to index pages with .PDF (The site I am crawling has all pdf extensions as .PDF)

How to make it ignore case when considering webpages?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

extensions

Post by John »

By default it will ignore the case of the extension. You could change that in the dowalk script by editing the <$SSc_respectextcase=N> line.
John Turnbull
Thunderstone Software
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

extensions

Post by KMandalia »

Thanks, but if you don't respect extension case then .PDF of .pdf shouldn't matter and it should walk .PDF when specified .pdf,right?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

extensions

Post by John »

Yes. What is the reason listed for the link not being followed?
John Turnbull
Thunderstone Software
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

extensions

Post by KMandalia »

None. I started out with the base url and checked it's child. The one below and similar ones are not even listed there (unwanted extnsion should be there, I have respectcase=N)

http://www.cujournal.com/CUJCLASS.PDF
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

extensions

Post by John »

Which page has the link to CUJCLASS.PDF?
John Turnbull
Thunderstone Software
Post Reply