A problem to crawl login-secured pages

Post Reply
rluan
Posts: 29
Joined: Mon Jul 23, 2001 1:22 pm

A problem to crawl login-secured pages

Post by rluan »

I used 'gw' with options '-U' and '-P' to crawl login-secured pages, but encountered a problem: the body content of the crawled pages are that of the LOGIN page, rather than those of the actual pages, though the URLs seem to be correct.

Is there some trick involved to crawl password-secured pages in order to bring up the correct page content? Thanks.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

A problem to crawl login-secured pages

Post by mark »

-U and -P are for standard HTTP authentication. Other methods are not supported. If authentication is via an html form you may be able to do it by giving the username and password in the initial url query string. (make sure you edit the url in your database after the walk so the login info is not evident).
Post Reply