Hi, folks.
Is there a way to get gw to use a URL that looks like this:
https://www.foo.com/cgi-bin/connect/aut ... ument.html
where home2.html is the actual page to be indexed?
Most all the pages on a secure site I maintain have all their links in the
form:
https://www.foo.com/cgi-bin/connect/aut ... ument.html
which is changed on the fly (by auth.pl) to:
https://www.foo.com/cgi-bin/connect/aut ... ument.html
and delivered to the browser. Then the browser user clicks on this URL to
get the next page- it's an old (sooo old) way to carry the authentication
around - in fact it'll be obsoleted by the new site in a few months.
So far, I haven't managed to convince gw that any of these links are places
to go, so it only indexes the login page and a couple of support pages. I
do have a filter (a pretty slow sed script, for this purpose) that I could
put in front and tell gw to pipe everything through (as a plugin), but I'm
not sure that would help.
Hmm. I just realized that I set it up to use "http://www.foo.com" rather
than https://www.foo.com.
I tried running gw with different options, and geturl on a bunch of
different approaches to this problem, without success. I tried this :
gw -y -C -v10 -n"text/html,html,./filter"
as my best (most creative) example so far.
Is there a way to do this?
GEB
-----------------------------------------
The axiomatic basis of political science:
1. Something must be done.
2. This is something.
3. Therefore, we must do it.
- adapted from http://www.javaworld.com/javaworld/jw-0 ... undon.html