Searching multiple sites

Post Reply
jai.thomas
Posts: 54
Joined: Tue Dec 11, 2001 6:20 pm

Searching multiple sites

Post by jai.thomas »

We are using commercial Texis/Webinator4.0.
We have a list of business-related web-sites which need to be crawled and made searchable. Question is how do I specify multiple sites in a Webinator profile?

Also, is it possible to create the Webinator database on a different server ?(ie. not the one where Web Server is running).

Thanks in advance for your help.

Jai
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Searching multiple sites

Post by mark »

Place all desired starting URLs into "Base URL". There are other ways, but that the most direct.
http://www.thunderstone.com/texis/site/ ... y=Base+URL

The database must be on the same machine as that running the index (where Webinator is installed). Also, it should not be on a network drive.
jai.thomas
Posts: 54
Joined: Tue Dec 11, 2001 6:20 pm

Searching multiple sites

Post by jai.thomas »

1. What is the best way to specify the site list; what we need is something that is easily maintainable and possibly updated programmatically without going through Webinator administration.

2. If the database must be on the same machine, how do I separate search reposiroty from our app. server. I was thinking of having a dedicated server for Texis.
Is there any way we can accomplish this? Texis FAQ says it is possible but doen't go into detail.

Also, I was experimeting with webinator by indexing a small set of documents. Looks like walking went fine, but when I try 'Live Search' it is returning 'No documents match the query' for all valid queries. I found the following comment at the end of html returned.
<!-- 115 /texis/webi/search:521: Query 'sorghum' would require linear search -->
What am I doing wrong here?

Thanks
Jai
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Searching multiple sites

Post by mark »

1. Some manual reading is in order. Use "URL File" ( http://www.thunderstone.com/texis/site/ ... y=URL+File ). And run the walk by hand once it's setup ( http://www.thunderstone.com/texis/site/ ... ing+dowalk ). I would suggest skimming the entire manual once so you know what's possible and where to look.

2. You can dedicate a machine to Webinator/Texis. There are at least a couple of ways to handle it. Run a webserver on the dedicated machine. Install Webinator there. A couple of ways to use that setup are 1) simply switch to that server to run Webinator searches. eg: your webserver is http://www.yoursite.com when a user needs to search you send them to http://search.yoursite.com/cgi-bin/texis/webi/search .
2) Your front end scripting language of choice can POST an HTTP request to the backend machine where Webinator is and display the results back to the user.

3. It appears that your walk is not going to completion. If it does the "Walk Status" should say "Making new database live:". Using "List/Edit URLs" you can surf through your database of walked pages to see what it got.
jai.thomas
Posts: 54
Joined: Tue Dec 11, 2001 6:20 pm

Searching multiple sites

Post by jai.thomas »

Alright, I will defer asking more questions on 1 and 2 until I have fully worked through the manuals.

Now, I am still having trouble with my test index. Walk status says "Making new database live:..." and list url returnes all links indexed. Still my search is returning no results and html has the same message.
Database location is d:\inet\lyport\web\texis\webi/db2.

I have made the following changes in search script.
<DB = d:\inet\lyport\web\texis\webi/db2>
<$defaultdir = "d:/inet/lyport/web/texis/webi/">
.............
<IF $db eq "">
<$db = "db2"><!-- default database in $defaultdir -->
</IF>

Given below is the walk status report for your reference. Please let me know what I am doing wrong.

Thanks for your help.
Jai

Commercial Texis

--------------------------------------------------------------------------------
Walk Status
Current Profile: labels Webinator 4.0.1
Finished run: Go to Walk Settings to configure and/or start a walk.

Webinator Walk Report for labels

Creating database d:\inet\lyport\web\texis\webi/db2...Done.
Walk started at 2002-01-17 10:04:33 (by user)
Start fetching at http://lyport/datastore/labels/
Ignore urls containing any of the following:
/cgi-bin/
~
?

started 1 (2204) on http://lyport/datastore/labels/
384 pages fetched (147,055,352 bytes) from http://lyport/datastore/labels/
1 errors
4 duplicate pages

Creating search index on fetched pages...Done.

Walk finished at 2002-01-17 10:10:23 (took 5 minutes 11 seconds)
Making new database live: d:\inet\lyport\web\texis\webi/db2

--------------------------------------------------------------------------------
Checking for broken hyperlinks...

The link : http://lyport/datastore/labels/labelslibrary.html
Referenced by : http://lyport/datastore/labels/
Had this error: Document not found: http://lyport/datastore/labels/sheets_accord.html returned code 404 (Not Found)
--------------------------------------------------------------------------------
Checking for duplicate pages...

The link : http://lyport/datastore/labels/test_custosps02.txt
Referenced by : http://lyport/datastore/labels/
Is a duplicate of: http://lyport/datastore/labels/10055943 ... osps02.txt

The link : http://lyport/datastore/labels/Suppleme ... andard.txt
Referenced by : http://lyport/datastore/labels/Supplementals/
Is a duplicate of: http://lyport/datastore/labels/Standard.txt

The link : http://lyport/datastore/labels/Suppleme ... rialca.doc
Referenced by : http://lyport/datastore/labels/Supplementals/
Is a duplicate of: http://lyport/datastore/labels/Suppleme ... alca99.doc

The link : http://lyport/datastore/labels/Suppleme ... rialca.doc
Referenced by : http://lyport/datastore/labels/Supplementals/
Is a duplicate of: http://lyport/datastore/labels/Suppleme ... alca99.doc
--------------------------------------------------------------------------------
End of report.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Searching multiple sites

Post by mark »

You shouldn't have to change anything in the search script unless you've moved files around after installation. How are you invoking the search (what url are you using)? Do you get any other messages in comments on the results page besides "would require linear search".
Have you changed the "Word Definitions"? What does a directory listing of d:\inet\lyport\web\texis\webi\db2 look like?
jai.thomas
Posts: 54
Joined: Tue Dec 11, 2001 6:20 pm

Searching multiple sites

Post by jai.thomas »

Well, I first tried without changing anything and was getting the same result. Now I reset all changes and stiil no difference.

I am using the 'Live Search' menu on Webinator admin to invoke the search.
Here's the url it shows: http://lyport/cgi-bin/texis/texis/webi/ ... ?pr=labels

Url after submitting a search is: http://lyport/cgi-bin/texis/texis/webi/ ... mit=Submit

I haven't 'Word Definitions' or anything else for that matter.

Heres the directory listing of d:\inet\lyport\web\texis\webi\db2

Directory of D:\Inet\Lyport\Web\texis\webi\db2

01/17/02 10:09a <DIR> .
01/17/02 10:09a <DIR> ..
01/17/02 10:04a 3,212 categories.tbl
01/17/02 10:09a 3,234 counts.tbl
01/17/02 10:08a 3,795 error.tbl
01/17/02 10:09a 14,364,483 html.tbl
01/17/02 10:04a 11,915 options.tbl
01/17/02 10:04a 3,212 querylog.tbl
01/17/02 10:09a 52,024 refs.tbl
01/17/02 10:04a 26,007 SYSCOLUM.tbl
01/17/02 10:10a 6,244 SYSINDEX.tbl
01/17/02 01:15p 9 SYSLOCKS.SEQ
01/17/02 10:04a 3,212 SYSMETAI.tbl
01/17/02 10:04a 4,552 SYSPERMS.tbl
01/17/02 01:03p 1,513 SYSSTATS.tbl
01/17/02 10:04a 10,335 SYSTABLE.tbl
01/17/02 10:04a 3,212 SYSTRIG.tbl
01/17/02 10:04a 3,324 SYSUSERS.tbl
01/17/02 10:10a 4,296 todo.tbl
01/17/02 10:04a 3,212 vortex.tbl
01/17/02 10:04a 148 xcatno.btr
01/17/02 10:09a 8,346 xerrorurl.btr
01/17/02 10:10a 2,560,524 xhtmlbodv.btr
01/17/02 10:10a 5,482,839 xhtmlbodv.dat
01/17/02 10:10a 3,040 xhtmlbodv.tok
01/17/02 10:09a 44 xhtmlbodv_D.btr
01/17/02 10:10a 682 xhtmlbodv_P.tbl
01/17/02 10:09a 148 xhtmlbodv_T.btr
01/17/02 10:09a 148 xhtmlcat.btr
01/17/02 10:09a 73,992 xhtmldepth.btr
01/17/02 10:09a 8,346 xhtmlhash.btr
01/17/02 10:09a 8,346 xhtmlid.btr
01/17/02 10:09a 65,732 xhtmlurl.btr
01/17/02 10:04a 8,346 xoptname.btr
01/17/02 10:04a 8,346 xoptstr.btr
01/17/02 10:04a 148 xqueryid.btr
01/17/02 10:09a 73,930 xrefsref.btr
01/17/02 10:09a 57,534 xrefsurl.btr
01/17/02 10:10a 9,380 xtodourl.btr
01/17/02 10:04a 148 xvid.btr
40 File(s) 22,877,958 bytes
8,513,978,368 bytes free


Thanks
Jai
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Searching multiple sites

Post by mark »

It looks like maybe you're using a webinator 2 search script instead of the webinator 4 search script that comes in the package. Make sure the version 4 script is in place in your webi directory. You can download it from the website examples page if you've lost it.
http://www.thunderstone.com/texis/site/ ... ample.html
jai.thomas
Posts: 54
Joined: Tue Dec 11, 2001 6:20 pm

Searching multiple sites

Post by jai.thomas »

Apparently that was the problem, not sure how I got 2.0 scripts though. Search works fine now!
Thanks a bunch for your help.

Jai
Post Reply