Page 1 of 1

A problem to crawl login-secured pages

Posted: Fri Sep 27, 2002 11:41 am
by rluan
I used 'gw' with options '-U' and '-P' to crawl login-secured pages, but encountered a problem: the body content of the crawled pages are that of the LOGIN page, rather than those of the actual pages, though the URLs seem to be correct.

Is there some trick involved to crawl password-secured pages in order to bring up the correct page content? Thanks.

A problem to crawl login-secured pages

Posted: Fri Sep 27, 2002 1:12 pm
by mark
-U and -P are for standard HTTP authentication. Other methods are not supported. If authentication is via an html form you may be able to do it by giving the username and password in the initial url query string. (make sure you edit the url in your database after the walk so the login info is not evident).