webinator - dowalk not indexing - japanese site

scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

webinator - dowalk not indexing - japanese site

Post by scott.shaver »

Okay I've read through the other posts here about problems indexing Japanese sites and I can't make heads or tails of the answers. I create a new profile for a japanese site that uses the shift_jis charset, changing only the base URL in the profile.

When I start the walk it simply sits there and spins. No pages are indexed. The stop walk buttons don't work I have to delete the profile to stop it.

Same thing happens when I try to index yahoo's japanese site. Same thing happens when I change the Source Default Charset in the profile to shift_jis.

Some specific information on how to set up a profile for a foriegn language site would great.
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

webinator - dowalk not indexing - japanese site

Post by Kai »

Which version of Webinator are you using (complete output of texis -version, run from the command line)?
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

webinator - dowalk not indexing - japanese site

Post by scott.shaver »

Texis Web Script (Vortex) Copyright (c) 1996-2005 Thunderstone - EPI, Inc
Free Webinator Version 5.01.1116433182 20050518 (i686-intel-winnt-32-32)

Just a note I'm trying to do this to prove to my company that is it worth my time to do a full eval of the appliance.
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

webinator - dowalk not indexing - japanese site

Post by Kai »

There was a fix just after that release for an issue where JavaScript setInterval() or setTimeout() calls on a page could cause an abend or loop. Is there JavaScript on the Base URL that you are using? Also, check Texis\vortex.log in the Webinator install dir for any errors.
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

webinator - dowalk not indexing - japanese site

Post by scott.shaver »

http://mcdatajp.hkit4u.com/ - Japanese
http://mcdatakr.hkit4u.com/ - Korean

There is Javascript code on those pages however there are no references to the functions you mentioned.

Here are the log entries releated to my attempts to index a japanese and korean site.

100 2005-05-31 09:00:27 f:\Webinator_running\texis\scripts/webinator/dowalk:3885: Document not found: http://mcdatajp.hkit4u.com/robots.txt returned code 404 (Object Not Found)
100 2005-05-31 09:00:28 f:\Webinator_running\texis\scripts/webinator/dowalk:3885: Document not found: http://mcdatajp.hkit4u.com/robots.txt returned code 404 (Object Not Found)
000 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:4992: Wrong server id
002 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:4992: Unable to open table f:\Webinator_running\texis\japan\db2\counts in the function opendbtbl
115 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:4992: No such table: counts in the database: f:\Webinator_running\texis\japan\db2\
000 2005-05-31 09:16:53 f:\Webinator_running\texis\scripts/webinator/dowalk:4992: SQLPrepare() failed with -1 in the function prepntexis
100 2005-05-31 09:17:52 /webinator/dowalk:9525: User PUBLIC has been added without a password.
100 2005-05-31 09:18:41 f:\Webinator_running\texis\scripts/webinator/dowalk:2071: User PUBLIC has been added without a password.
100 2005-05-31 09:18:42 f:\Webinator_running\texis\scripts/webinator/dowalk:3885: Document not found: http://mcdatakr.hkit4u.com/robots.txt returned code 404 (Object Not Found)
100 2005-05-31 09:18:43 f:\Webinator_running\texis\scripts/webinator/dowalk:3885: Document not found: http://mcdatakr.hkit4u.com/robots.txt returned code 404 (Object Not Found)
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5375: Wrong server id
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:4992: Wrong server id
002 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:4992: Unable to open table f:\Webinator_running\texis\korea\db2\counts in the function opendbtbl
115 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:4992: No such table: counts in the database: f:\Webinator_running\texis\korea\db2\
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:4992: SQLPrepare() failed with -1 in the function prepntexis
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5779: Wrong server id
002 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5779: Unable to open table f:\Webinator_running\texis\korea\db2\error in the function opendbtbl
115 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5779: No such table: error in the database: f:\Webinator_running\texis\korea\db2\
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5779: SQLExecute() failed with -1 in the function execntexis
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5780: Wrong server id
002 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5780: Unable to open table f:\Webinator_running\texis\korea\db2\error in the function opendbtbl
115 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5780: No such table: error in the database: f:\Webinator_running\texis\korea\db2\
000 2005-05-31 09:38:55 f:\Webinator_running\texis\scripts/webinator/dowalk:5780: SQLExecute() failed with -1 in the function execntexis
100 2005-05-31 09:39:30 /webinator/dowalk:9525: User PUBLIC has been added without a password.
100 2005-05-31 09:41:44 f:\Webinator_running\texis\scripts/webinator/dowalk:2071: User PUBLIC has been added without a password.
100 2005-05-31 09:41:46 f:\Webinator_running\texis\scripts/webinator/dowalk:3885: Document not found: http://mcdatajp.hkit4u.com/robots.txt returned code 404 (Object Not Found)
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

webinator - dowalk not indexing - japanese site

Post by John »

That looks like a more general error. Does the user that the scripts will run as (typically the IUSR_ account) have full control to the dataspace directory? It looks as if if did not create the database correctly. If you set the permissions and try a new walk again it should work.
John Turnbull
Thunderstone Software
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

webinator - dowalk not indexing - japanese site

Post by scott.shaver »

Well I'm assuming it has full access since it has no problem with creating the databases for the three english sites I've indexed. Those errors may just be becuase I deleted the profiles and it didn't really delete them. So when I tried to create them again with the same names it puked.

I created a new profile again today using the defaults and changing the base url to http://mcdatajp.hkit4u.com and it still just sits and spins. here is the log output:

100 2005-06-01 10:13:35 f:\Webinator_running\texis\scripts/webinator/dowalk:2071: User PUBLIC has been added without a password.
100 2005-06-01 10:13:36 f:\Webinator_running\texis\scripts/webinator/dowalk:3885: Document not found: http://mcdatajp.hkit4u.com/robots.txt returned code 404 (Object Not Found)
100 2005-06-01 10:13:37 f:\Webinator_running\texis\scripts/webinator/dowalk:3885: Document not found: http://mcdatajp.hkit4u.com/robots.txt returned code 404 (Object Not Found)
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

webinator - dowalk not indexing - japanese site

Post by scott.shaver »

and from the monitor.log file

200 2005-06-01 10:12:18 (9272) Database Monitor on f:\Webinator_running\texis\jp2\db1 starting
200 2005-06-01 10:13:18 (9272) Database Monitor on f:\Webinator_running\texis\jp2\db1 exiting
200 2005-06-01 10:13:37 (9308) Database Monitor on f:\Webinator_running\texis\jp2\db2 starting
200 2005-06-01 10:13:39 (8948) Database Monitor on f:\Webinator_running\texis\jp2\db1 starting
200 2005-06-01 10:14:39 (8948) Database Monitor on f:\Webinator_running\texis\jp2\db1 exiting
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

webinator - dowalk not indexing - japanese site

Post by scott.shaver »

Looking a bit closer those sites appear to be some sort of ASP.NET app. It seems to be doing some sort of funky redirect from the base URL of http://mcdatajp.hkit4u.com. I used telnet to hit that url and got back an odd page, not the same one as shows up in the browser.

I tried the following in a browser:

http://mcdatajp.hkit4u.com/index.html
http://mcdatajp.hkit4u.com/index.htm
http://mcdatajp.hkit4u.com/default.html
http://mcdatajp.hkit4u.com/default.htm
http://mcdatajp.hkit4u.com/default.asp
http://mcdatajp.hkit4u.com/index.asp

and get and 404 back on each one. I think the ASP thing is causing the problem. Unfortunately our group doesn't control these sites right now as they are temporary outsource solutions for asia pacific.
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

webinator - dowalk not indexing - japanese site

Post by scott.shaver »

attempted to index http://cnn.co.jp/ and got the same spinning problem. attempted to index french site which kind of worked, no spinning, but the live search never returned any results.

has anyone actually used the free webinator to index a non-english site?
Post Reply