hardware or software error?

resume.robot
Posts: 68
Joined: Sat Jan 13, 2001 1:23 am

hardware or software error?

Post by resume.robot »

Operating texis/webinator on a sun 64-bit E-250 server with 2Gb ram running solaris 2.7.

Commercial Version 3.01.962147411 of Jun 27, 2000 (sparc-sun-solaris2.6)

Running webinator searches, this machine has produced errors from the beginning. Kai provided some phone support several months ago, and a portion of the problem was identified as dns error. The dns was corrected but errors persist.

Here is the string that is being executed:

nohup gw -d/database -noindex -a -R -r -O -fshtml -fasp -fcfm -fjsp -fxml -t7 -z5000 -v4 "&list" > nohup.a

After the todo list grows, then additional gw strings are executed:

nohup gw -d/database -noindex -a -R -r -O -fshtml -fasp -fcfm -fjsp -fxml -t7 -z5000 -v4 > nohup.b

When I run multiple strings of gw, almost all of them die within a few hours, giving the message

000 Got signal 11 - quitting now
or
000 Got signal 10 - quitting now

According to the manual "UNIX Unleashed" description of kill signals:

10 Bus Error. Usually caused by a programming error, a bus error can be caused only by a hardware fault or a binary program file.

11 Segment violation. Caused by a program reference to an invalid memory location; can only be caused by a binary program file.

I suspect bad ram.

Before I approach Sun tech support again, I would like to know if you have seen this before, and if there is any chance it could be a software error.

Texis has been re-installed on this machine several times and it has not solved the problem.

I am running commercial webinator on a linux machine, using the exact same gw strings, and never have any problem.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

hardware or software error?

Post by mark »

Signals 10 and 11 may be caused by software or hardware. Your specific problem doesn't sound familiar. I assume you're running the multiple gw's at the same time against the same database.? Does it also happen if you use -dns=sys?

How do the webinator versions/releases compare on your sun vs. your linux?
resume.robot
Posts: 68
Joined: Sat Jan 13, 2001 1:23 am

hardware or software error?

Post by resume.robot »

yes multiple simultaneous gw's are identical and on the same db, the only difference being that the first execution reads a list, the following executions merely spider from todo

webinator version on linux is older:

Webinator WWW Site Indexer Version 2.52 (Commercial)
Copyright(c) 1995,1996,1997,1998 Thunderstone EPI Inc.
Release: 19990218

sun:

Webinator WWW Site Indexer Version 2.56 (Commercial)
Copyright(c) 1995,1996,1997,1998,1999,2000 Thunderstone EPI Inc.
Release: 20000627


I ran -dns=sys previously but it has been a while and I don't have records. My recollection is that it did not solve the problem. Should I run a major test using this option and observe the results? If a test of -dns=sys failed, would this positively identify a hardware error?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

hardware or software error?

Post by mark »

It's hard to completely eliminate the possibility of a software glitch but gw is pretty stable and I would expect the newer version to be generally more stable. Probably the single largest operational change between your versions that would affect walk behavior is the internal dns routines. Using -dns=sys will eliminate them as a potential source of errors.

Do they all tend to die at about the same time as each other? Do you get any other messages just before the signals?
resume.robot
Posts: 68
Joined: Sat Jan 13, 2001 1:23 am

hardware or software error?

Post by resume.robot »

No, they die individually at different times. There are no other messages I see.

When todo is emptied, all processes die normally and the reminder to index is printed.

Currently I am running a test with 8 gw processes as follows, they haven't died after an hour. I will let this run and see what happens. The original gw is still adding to todo, so there are 9 executions running.

nohup /gw -d/export/usr/data/user.new -noindex -dns=sys -a -R -r -O -fshtml -fasp -fcfm -fjsp -fxml -t7 -z5000 -v9 > nohup.11115p
resume.robot
Posts: 68
Joined: Sat Jan 13, 2001 1:23 am

hardware or software error?

Post by resume.robot »

Here is a typical kill message:

http://www.nscl.msu.edu/~anthony/dwaresume.html
1458: TotLinks: 10835, Links: 6/ 0, Good: 0, New: 0 Retrieving
1458: TotLinks: 10835, Links: 6/ 0, Good: 0, New: 0
100 Document not found: http://www.nscl.msu.edu/~anthony/dwaresume.html returned code 404 (Not Found)
1458: TotLinks: 10835, Links: 6/ 0, Good: 0, New: 0
http://lexav.nettalk.free.fr/Contact___ ... vitae.html
1458: TotLinks: 10835, Links: 6/ 0, Good: 0, New: 0 Retrieving
1458: TotLinks: 10835, Links: 6/ 0, Good: 0, New: 0
000 Got signal 11 - quitting now
1458: TotLinks: 10835, Links: 6/ 0, Good: 0, New: 0
000 Got signal 10 - quitting now
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

hardware or software error?

Post by mark »

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

hardware or software error?

Post by mark »

p.s. please also indicate how big your database got so we know how much space we'll need.
resume.robot
Posts: 68
Joined: Sat Jan 13, 2001 1:23 am

hardware or software error?

Post by resume.robot »

Still running with -dns=sys, total 11 gw processes have not died yet, let me see if I can kill them first. If not, then -dns=sys may be the answer.

List has 386,000 urls, is 18 Mb

Basically identical database on linux machine, du shows 1.2 Gb, ls -al shows html.tbl 550 Mb

Partially populated database on sun machine, du -k shows 400 Mb
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

hardware or software error?

Post by mark »

Ok. We'll wait for your report. Let us know.
Post Reply