Because site is dynamic, although HTTP page and HTTPS has the same content, Search Appliance tend to index both...and it is a problem for the site search.
Unless you're putting https in the base url or have the https option checked it shouldn't walk https. Double check all of your walk settings and post all of the non-default ones here so someone can maybe spot the problem.
I see... I do see baseURL is correct site. docmagic.com
I just saw this selection to check "title" for duplication.
This may work for our solution... because all I wanted to do was delete any duplicate content. (Search appliance may be able to filter better with "title" being checked. If this doesn't work, I will come back to figure out what this HTTP/HTTPS issue is....
It looks like there's a <script> on docmagic.com that reference a https: script, and that is causing
it to fail with "disallowed protocol", even though the disallowed protocol is on the <script> instead of the page itself.
It looks like if you check the "HTTPS" box in the protocols section, and add "https://" to "Exclusion Prefix", then it will do what you want - HTTPS pages will not be included in the walk, and pages referencing https scripts will be ok.
You probably also want to add .jsp to your extensions, as that's what all the pages are.