Page 1 of 1
Indexing Document that has Next 30 option
Posted: Thu Oct 22, 2009 12:32 pm
by greg.householder
Hello,
I have a page that shows our Article Archive. You start out by going to archive.asp.. If you click on the Next 30 button you go to a link like this.
Archive1.asp?NAV=2
Is there a way to have our Thunderstone appliance index this page with the NAV= switch. Currently these archive pages are the only thing that reference all of our article files. The folders that the archive files are in do not allow directory browsing.
Thanks,
Indexing Document that has Next 30 option
Posted: Thu Oct 22, 2009 1:18 pm
by jason112
Try removing "?" from exclusions and set "Strip Queries" to "N". This should cause the appliance to properly index the pages that differ only by NAV.
Indexing Document that has Next 30 option
Posted: Thu Oct 22, 2009 1:26 pm
by greg.householder
I don't see the ? in the exclusions right now, but the strip queries was set to yes.
This appliance is something that I have inherited and I'm trying to make it usable, because honestly it's been pretty crappy since I started here.. I think it's mostly configuration, but I've already tweaked it to make it better..
My confusion on this part is that you start on a Archive.asp page and when you choose next 30 it changes you to an archive1.asp. Does the Strip Queries option let the appliance virtually press that next 30 button?
Thanks,
Indexing Document that has Next 30 option
Posted: Thu Oct 22, 2009 2:01 pm
by mark
The appliance followes all the links found on the page.
eg
http://somesite/archive1.asp
http://somesite/archive1.asp?NAV=2
http://somesite/archive1.asp?NAV=3&foo-bar
But with "Strip queries" on the ? and everything after is removed first so it goes after
http://somesite/archive1.asp
http://somesite/archive1.asp
http://somesite/archive1.asp
but only once since they are now identical.
Indexing Document that has Next 30 option
Posted: Thu Oct 22, 2009 2:02 pm
by jason112
> My confusion on this part is that you start on a
> Archive.asp page and when you choose next 30 it changes
> you to an archive1.asp
That part isn't an issue, however...
> Does the Strip Queries option let the appliance
> virtually press that next 30 button?
The crawler does not press buttons to submit forms when it is crawling. These are usually for actions that have repercussions, like "submit order" or "create account", things you don't want an automatic crawler doing.
If there's a regular link that goes to "Next 30", then it should be followed.
Indexing Document that has Next 30 option
Posted: Thu Oct 22, 2009 3:16 pm
by greg.householder
Thanks a lot.. That answered my question to hopefully get to a workable fix.