Indexing java servlets and jsp pages ?

User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Indexing java servlets and jsp pages ?

Post by Thunderstone »




I have downloaded Webinator for use as a search engine on a website. This website will be composed of primarily JSP pages. The only way to access these jsp pages is if the user has logged into the site. A user logs in by typing a username and password in an html form. The value of the action of this form is a Login Servlet, which processes the username and password, checks if it is a valid pair, sets a cookie (or session) , and redirects the user to a menu of jsp pages which the user is allowed to view.

If I index the login page, let's call it - "login_form.html" by typing in "gw -y http://servername.com/login_form.html", it cannot process the form because I did not provide a username and password, correct?

If I explictly index by using the query string, say - 'gw -y "http://servername.com/servlet/LoginServ ... SWORD=pass" ' webinator recognizes this URL as valid, but now, I have violated the integrity of the site because the URL displays the username and password.

Q: If I want to index the menu of jsp pages, and the only path to get to this information is to login via the form, what would I have to type at the webinator prompt to index the information correctly and securely?

another Q: In conjunction with this, is Webinator able to properly index a jsp page if this page checks for a cookie or an http session?
[clarification about the jsp page - The code within the jsp page will check if a cookie is present, if it is present, it will display the content of the page; if not, it will redirect them back to login_form.html]

Thanks, in advance, for any help you can provide.







User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Indexing java servlets and jsp pages ?

Post by Thunderstone »



Webinator supports standard HTTP authentication with the -U and -P
options, which would be the best way to securely index the pages. Gw
does not currently keep track of cookies during the walk. With the
full Texis version you could use the Vortex language to write a simple
walker that would maintain cookies.

John Turnbull
-------------
Thunderstone Software

Pam Paulino said:


Mark W
Posts: 1
Joined: Tue May 22, 2001 11:22 am

Indexing java servlets and jsp pages ?

Post by Mark W »

Hi,

I know it's odd to follow up on such an old topic, but here goes.

According to everything that I have read, gw does not keep track of cookies, but it is possible to use Vortex to write a script to maintain them in conjunction with the full version of Texis.

Is it possible to add such cookie management to the dowalk_beta scripted walker that has been put out for Webinator? Has anyone attempted this?

--Mark W
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Indexing java servlets and jsp pages ?

Post by mark »

Yes it's possible. It would involve using <urlinfo header Set-Cookie> and <urlinfo metaheader Set-Cookie> and <urlinfo metaname Set-Cookie> to find all possible cookies after a fetch and <urlcp clearheaders><urlcp header Cookie ...> before the next fetch to send cookies back.