Page 1 of 1

handshake error when attempting to index https content

Posted: Tue May 21, 2013 9:03 am
by henry.legedza
Hi there,

Curently whenever we try indexing content on a https site we continually get the following error:

Server error: https://siteurl returned code 500 (handshakefailed)

I have Webinator set so that it recognises the https protocol.

Any suggestions as to what to check for? I've never come across this error.

Thanks
Henry

handshake error when attempting to index https content

Posted: Tue May 21, 2013 9:55 am
by mark
Code 500 is the server saying it had an unknown error attempting to deliver the requested content. Often caused by a server app breaking. If possible check the web server's error logs to see if it provides any more detail. Otherwise try setting Webinator's User Agent to match what a browser would send. Sometimes cgi/asp/etc. apps crash when given unexpected inputs.

handshake error when attempting to index https content

Posted: Tue May 21, 2013 7:03 pm
by henry.legedza
I checked the IIS error logs through event viewer logs and couldn't find anything obvious.

Currently our User Agent is Mozilla/5.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)

The same web server quite happily delivers http content for indexing - it's just https.

handshake error when attempting to index https content

Posted: Tue May 21, 2013 7:25 pm
by henry.legedza
I looked through the vortex log and noticed that before each attempt at reindxing the https content this message was logged:
"User Public has been added without a password"

Does this mean anything?

handshake error when attempting to index https content

Posted: Wed May 22, 2013 10:04 am
by jason112
That's normal, it's logged every time a new database is created. That happens whenever a profile is created, or a "New" walk is started.

Are you able to crawl any other https sites, or do all of them exhibit the same behavior?

handshake error when attempting to index https content

Posted: Wed May 22, 2013 8:27 pm
by henry.legedza
I tried an external https site with no authentication required and got the same handshake error.

The Webinator box does need to go out through a proxy server and the handshake issue seemed to have started around about the time it was moved to this new proxy server.

Is there anything there we might need to look at?

handshake error when attempting to index https content

Posted: Thu May 23, 2013 10:40 am
by jason112
Current Webinator cannot properly connect to https sites through a proxy. The new proxy server is likely recognizing this and providing the 500 error.

It's theoretically possible for the proxy server to allow this by acting as a "man in the middle", but this proxy server either can't or is choosing not to.

This functionality will be added in a future release of Webinator.

handshake error when attempting to index https content

Posted: Sat May 25, 2013 7:53 am
by henry.legedza
You suggested: "It's theoretically possible for the proxy server to allow this by acting as a "man in the middle", but this proxy server either can't or is choosing not to."

Are there things we can check on the proxy to determine whether it can't or is not set to do so???

handshake error when attempting to index https content

Posted: Mon May 27, 2013 1:02 am
by henry.legedza
I have spoken to our IT people and this was their response:

The proxy is able to handle both HTTP and HTTPS proxy requests. It is already configured to allow all traffic from our search server to be proxied without needing authentication. It is also configured to bypass the SSL scanners etc for the search server.

The central proxy is already configured to bypass the majority of the auth/filtering for the search server.

handshake error when attempting to index https content

Posted: Tue May 28, 2013 10:27 am
by Kai
To clarify, when attempting to make an https/SSL connection through a proxy, Webinator will issue a `GET https://somesite/path' request to the proxy -- the same as it would with an http URL. It does not yet support the `CONNECT' method to tell the proxy to make a pass-through TCP connection (allowing a seamless SSL connection from Webinator to the origin server). A future release of Webinator will support `CONNECT'.

Perhaps your proxy is rejecting `GET https://...', expecting a `CONNECT' instead for https content? When you say http (unsecure) content works, I assume it also work through the same proxy? That would indicate to me that the proxy might be expecting `CONNECT'.