Various webinator/texis capability questions

Post by **Thunderstone** » Tue Apr 21, 1998 10:23 am

I'm interested in buying Webinator, and possibly full Texis. I have some
questions:

- How does Webinator's efficiency (CPU, disk access, index size, etc)
compare to other full-text search engines such as Verity, SWISH-E, AltaVista
Search, Infoseek, Excite, etc.
- How would I have Webinator preferentially return the most recently added
pages first (ie: the results of the most recent walks)?
- Can I set Webinator up so that the URLs returned are modified by some
statement so that they go through a redirector of ours instead of being the
actual underlying URL of the page that was found? (Example: The URL that was
found was http://www.yahoo.com, and I'd like it instead to make the hotlink
http://www.mysite.com/cgi-bin/redirect. ... .yahoo.com)
- Can I have Webinator index pages on my site without going through the
webserver (index just using the filesystem)?
- Can Webinator index data fields in a(nother) SQL database such as an
Oracle database?
- Where do I get full documentation for how to use SQL statements to
modify/remove, etc. data from the webinator database?
- When would I want to use Webinator, and when would I want to use Texis?

For Texis:

- Are there JDBC and Perl DBI interfaces to Texis?
- Is Texis suitable as a general purpose SQL database? How do its features
and speed compare to Oracle and Sybase? Do you have any comparison literature?
- Does Texis have any data replication features? Ability to run on multiple
back-end servers?
- What is the Automatic Document Categorization engine, and where can one
see a demo of it?
- What is the pricing for Texis?

Thanks.

Post by **Thunderstone** » Tue Apr 21, 1998 12:43 pm

Webinator/Texis' indexes are generally smaller than other products.
Webinator can certainly best these products on a feature by feature
comparison.

Texis itself is not in the same class as the other products you mention.
They are only text retrieval products, Texis is a RDBMS. This allows
it to have far more functionality than the others.

BTW: Index overhead/speed in Texis is a tunable parameter.

Change the SQL query to do a

"where textfields like $query order by Visited desc"

You'll also need to create an inverted index (DATE data-type)
against the Visited field

See: http://www.thunderstone.com/texisman/node140.html
for a discussion.

Yes: look at http://search.thunderstone.com/ it does exactly this.

Heres the Texis Web Script for doing a redirector/logger:

<SCRIPT LANGUAGE=vortex>

<DB = /mydatabase/directory>

<A NAME=main>
<EXEC /usr/bin/tee -a /tmp/query.log>
<fmt "%at\t%s\thttp://%s\t%s\n" "%Y-%m-%d-%H:%M:%S" "now" $REMOTE_HOST $u $q>
</EXEC>
Content-Type: text/html
Location: http://$u

The document is <A HREF="$u">here</A>.
</A>

</SCRIPT>

Not a good idea. While Texis is fully capable of indexing local files,
it would be a violate a lot of rules for Webinator or any site-indexer
to index anything that was not first processed through its Webserver.

See the tech support archive for the reasons.

Anything that produces a URL can be indexed by the Webinator. But this
specific activity is a violation of the Webinator's License Agreement,
and requires a full Texis License.

Oracle and Texis are competing products. Thunderstone frequently replaces
Oracle with Texis at customer sites, but sometimes we have to supplant
the functionality of an existing Oracle DB. This is done by sharing
keys between the two RDBMS's. Since both of them talk SQL its a pretty
easy job to get them to work together.

See the Webinator Admin manual and Texis manuals on our Web site.

Webinator is an application program based on Texis. Texis and Texis Web Script
can be used to make almost any Web application. Webinator is just for
creating an index into relatively static html and text documents on a site.

We used to support JDBC, but we dropped the product due to lack of demand.
The best way to talk to Texis is through Texis Web Script.

You can also write interfaces via the Texis API, but theres really
not much need to do this unless you're really doing something like
a wire-feed handler or a non HTTP based application.

The relationship between Texis and Oracle is kind of like the similarity
between a ball-peen hammer to a claw hammer. When all you want to
do is whack at something either will do, but properly used each tool has
a distinctive specialty.

Oracle is very good at transaction processing of the Accounting System kind.

Texis is very good at media object management and retrieval.

Everything from storage management to query optimization in Texis
is specifically targeted towards the kind of things people need when
they are developing a system that will manipulate language and media-objects.
Oracle sucks at everything we are good at. Texis sucks in some of the
areas Oracle is really good at.

Its really a matter of picking the right tool for the job.
Since most Web applications involve text and document manipulation
as a core requirement, IMHO Texis is a better choice as a
general database for a web site because it wont box you in with
limitations like the others will, and Texis WebScript is a
far-far better environment than anything the other vendors have for
developing applications.

ZDNET.COM, EBAY.COM, DOGPILE.COM and VJF.COM (all Top 100 sites)
have all replaced other SQL RDBMS and/or text retrieval and scripting
products with Texis as their tool of choice.

Replication is handled on a case-by-case basis. There are assistive tools
in the Texis package to enable replication and duplication, but each
application case is so unique that we've been unable to model a general
purpose replication agent.

Its a pretty easy thing to do though. We usually use Trigger based jobs
to initiate transactions against the remote DB's.

The Categorization Engine purpose is to place un-categorized documents
into one or more pre-ordained categories based on their content with
respect to a training corpus of already categorized documents.

You'll have to call us for a demo of this.

The starting price of Texis is around $10K for a single server/5 concurrent
user installation.

Thanks,

Bart Richards,
Thunderstone