Page 1 of 3

Not able to log in to Webinator

Posted: Fri Mar 20, 2009 5:00 am
by erling.ervik
I have make sure that I have a scripts directory and that is has execute permissions on scripts and executables.

I run the installer, it suggested the correct directories, and everything worked as expected. But I can not start the admin console.

Error message: The page cannot be found
The URL that Webinator tried after install is:
http://kmyhre-inett.inett.utvikling.no/ ... tor/dowalk

I think this URL is wrong so I have tried:
http://localhost/texis/webinator/dowalk

http://localhost/
will bring up the normal website

We are just testing the free version so far, and have not done any purchase yet.

We are using Windows 2003 server on VMware virtual machine, running on Vista 64bit.

The install was run for use of ISAPI. I was not able to find any errors in the event log mentioning texis or Webinator.

Not able to log in to Webinator

Posted: Fri Mar 20, 2009 10:06 am
by jason112
The ISAPI Proxy Module should make a single entry in the event log on successful startup, so I'm thinking it might not be present.

In the IIS Manager, if you look at the root directory of the website, is there a virtual directory "texis" beneath it?

If not, please follow the "manual setup" steps listed in the manual for adding the ISAPI Proxy Module to IIS:
http://www.thunderstone.com/site/webina ... later.html

Not able to log in to Webinator

Posted: Mon Mar 23, 2009 2:51 am
by erling.ervik
Hi, and thanks for your answerer.

I have followed your instruction for manual setup, and have now a virtual directory under the main application called siteSearch.

But any attempt to browse http://localhost/siteSearch/Webinator/dowalk
just returns a 404 error message.

Any other idea?

Not able to log in to Webinator

Posted: Mon Mar 23, 2009 10:21 am
by jason112
Webinator currently requires the virtual directory to be named texis (I just saw the documentation wasn't clear on this, it will be changed).

Unfortunately virtual directories can't be renamed, so you'll need to delete "siteSearch" and create a "texis" virtual directory with the same wildcard application map as done before.

Not able to log in to Webinator

Posted: Tue Mar 24, 2009 2:21 am
by erling.ervik
Thanks!
That did the trick. Now on to testing...

Not able to log in to Webinator

Posted: Tue Mar 24, 2009 2:56 am
by erling.ervik
Now we have done a little testing. And it didn't feel so good. Of our 25 000+ pages in the site, Webinator only found 22. Of them 1 was indexed and 21 was rejected with error.

I have marked that the extension of the pages should be: .html .htm .txt .aspx .ascx - as this website is made in Microsoft .NET. There is not many .htm or .html pages. Almost all pages was of .aspx and .ascx (webcontrol) types.

There are some javascript in most pages, and since this is not supported in this demo version, that may be the reason why we got 21 pages rejected. Why it didn't find more than 22 pages is beyond me.

Since it takes a day to get answers (due to time difference between US and Norway). My understanding is that Webinator is not well suited to use on this websites made by MS .NET using EPIServer as a framework.

I don't think we can afford to spend much more time testing this further. Sorry about that.

Not able to log in to Webinator

Posted: Tue Mar 24, 2009 9:41 am
by jason112
Webinator should have no problem crawling aspx and ascx files, just as a browser has no problem with them - Webinator interacts with the site as if it were a browser.

What were the errors listed for the other 21 URLs?

Not able to log in to Webinator

Posted: Tue Mar 24, 2009 9:43 am
by jason112
Also, lack of a Javascript module will not cause pages to be rejected, the Javascript simply won't be executed. Worst case scenario, there may be some links that may not be followed if they're _generated_ by Javascript.

But lack of Javascript shouldn't be the cause of any errors seen.

Not able to log in to Webinator

Posted: Tue Mar 24, 2009 10:23 am
by mark
You'll probably need to remove ? from the exclusions and turn off "strip queries".

During testing you should set your rewalk type to new instead of refresh. You can also set verbosity to 4 to better see why it skips things. Put it back to 2 once it's running normally.

To see if the site can be indexed without javascript turn off javascript in your browser and see if you can navigate to the places you want to index.

Not able to log in to Webinator

Posted: Wed Mar 25, 2009 6:17 am
by erling.ervik
I tried once more. Turned off "strip queries", and removed ? from exclusions.

Here is the log from the search:
Walk Status
Current User: webinator
Current Profile: NyTest Webinator 5.1.78-Windows-wo/plugin

Latest run:
0 pages in todo
1 pages scheduled to be refreshed in the next hour
1 pages visited in the last hour (1 success/0 failed)
1 pages in index


Pages recently walked
1 pages (63,270 bytes).
0 errors.
0 duplicate pages.

Page Visited Modified Url
-------+-------------------+-------------------+-------------------------------------------------------
1 Less than 1 min ago 1479 d, 1 hr+ ago http://localhost/ (63,270 bytes)

Recent errors
Visited Reason Url
--------------------+--------------------+-------------------------------------------------------

Next Pages to be walked
Next Check Modified Url
--------------------+------------------+-------------------------------------------------------
In 59 mins 1479 d, 1 hr+ ago http://localhost/ (63,270 bytes)

Webinator Walk Report for NyTest

Creating database C:\Program Files\Thunderstone Software\Webinator/db1...Done.
Walk started at 2009-03-25 11:31:37 (by user)
JavaScript walking not enabled by current license
HTTPS walking disabled
Start fetching at http://localhost/
Ignore urls containing any of the following:
/cgi-bin/



2009-03-25 11:31:37 started 1 new (5672) on http://localhost/
Using primer: http://localhost/
1 pages fetched (63,270 bytes) from http://localhost/ took 1 seconds
0 errors
0 duplicate pages

Creating search index on fetched pages...Done.
Creating spell-checker dictionaries...Done.
2009-03-25 11:31:40 0 Extra Indexes done
Done.
Verifying usability of new walk.

Walk finished at 2009-03-25 11:31:40 (took 2 seconds)
Please contact sales at Thunderstone Software to upgrade your license to include Best Bets.

Making new database live: C:\Program Files\Thunderstone Software\Webinator/db1

--------------------------------------------------------------------------------
Checking for broken hyperlinks...
No broken hyperlinks found. Nice Job!
Checking for duplicate pages...
No duplicate pages found.
--------------------------------------------------------------------------------
End of report.

I probably do something so stupid, that you have not think of it - yet.
Anyway doing the same with Xtreeme search Studio indexed 17318 pages. And Microsoft Search Server 2008 Express gave 21510.
I have a few more engines to try, so if you have other tips, you'r welcome.