Does anyone out there have any experience walking a Drupal site that requires authentication?
We are attempting to do this but we keep running into road blocks. At first, we thought we were having an issue authenticating against our Single Sign-On (CAS), eventhough we are able to walk other sites hooked up to our CAS installation. However, even with CAS out of the picture, and just using the default Drupal authentication, we still can't seem to walk the site. It appears as though Drupal thinks the TS box is not logged in. This is the same type behavior that was exhibited when we had CAS hooked up.
Has anyone else experienced this? If so, what did you do to work around the issue? Also, if you are crawling Drupal and haven't had any issues, I'd appreciate knowing that as well.
Assuming a "Base URL" like http://www.mysite.com/drupal/
Replace the "Exclusions" with
/drupal/logout
/drupal/user
and any other areas you don't want indexed.
Set "Strip Queries" to N.
Set "Primer Type" to Custom.
Set "Custom Primer URL" to http://www.mysite.com/drupal/
Set "Custom Primer Variables" to
name=MYLOGIN&pass=MYPASSWORD
where "MYLOGIN" and "MYPASSWORD" are your login and password respectively. Be sure to URL encode those values. eg. use %20 for space etc. ...pass=MY%20PASSWORD
Mark - Thanks for your reply. Unfortunately, for me, this is how we have our walk set up but it still doesn't work. I am assuming from your post that you have gotten this to work on your end, correct?
From what I can tell, it looks like we get logged in during the Primer URL call, but then once it does the walk, Drupal doesn't think we're logged in.
The whole thing is very strange because the site works fine while browsing. And, we're walking several other sites that require authentication without any issues. This seems to be the only one we are having trouble with. The only time I was able to walk the site is when we had authentication turned off. If you have any other suggestions on what I might look at, I would certainly appreciate it!!
Make sure you're using "www.mysite.com", or whatever it's called, not "mysite.com" in your base url and primer.
If that doesn't do it you'll probably have to supply a lot more detail about your crawl settings and site. If you don't want to do that here you can open a support ticket.