Use of the -k"expr" option.

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Use of the -k"expr" option.

Post by Thunderstone »




I am working with a set of html documents that quote email messages. A
typical document is shown below. How would one tell indexer to
consider strings of the form "user@somedomain.com" as a word? The
trouble is I am unsure how to classify the "@" when using the -k"expr"
option. Is it alphanumeric or something else? Ditto for strings with the
period as in "domain.com".

/*----------------------example-------------------------------------
<html>

<head><title>
Lorem ipsum dolor sit amet.
</title>
</head>

<h1><center>
Lorem ipsum dolor sit amet
</center></h1>

<a name ="headers">Msg Hdrs</a><hr>
<p>
<hr>
<a name = headers></a>Date: Thu, 16 Jan 1066
<br>

From: John Doe &lt doughboy@acme.com &gt
<br>

Subject: Lorem ipsum dolor sit amet.
<br>

To: user@mynode.mydomain.com
<br>

Cc:friend@other.domain.org,person@school.edu<br>

<pre>

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy
nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Duis
autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et
accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit
augue duis dolore te feugait nulla facilisi.

</pre>
</html>
----------------------example----------------------------------------*/

Thanks in advance.

Regards
Anthony,
:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:
|J. Anthony Waldron | Anthony.Waldron@innosoft.com |
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
|Innosoft International, Inc | Telephone: (818)919-3600 |
|1050 East Garvey Avenue South | FAX: (818)919-3614 |
|West Covina, California 91790 | URL: http://www.innosoft.com|
:t~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Use of the -k"expr" option.

Post by Thunderstone »



J. Anthony Waldron said:

The first point to note is that the first -k option overrides the default
expression, and subsequent -k"expr" are in addition. What you probably
want to specify is

-k"\alnum{2,99}" - to match ordinary words, and user, somedomain as
individual entities.
-k"[\alnum@\.]{2,99}" - to match "user@somedomain.com"
-k"[\alnum\.]{2,99}" - to match "somedomain.com" by itself.

You should then be able to find a message by searching for user,
somedomain.com or user@somedomain.com.

You may want to check the command with "echo" first to make sure that
the backslash is not being interpreted by the shell.

John Turnbull
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Use of the -k"expr" option.

Post by Thunderstone »



On Thu, 16 May 1996, John Turnbull wrote:


As it turns out, I have never had this solution work for me. Perhaps it
has something to do with your command line parser for queries. I have
come to this conclusion after reading through your search-help web page
under the section "Applying Search Logic". The "@" symbol has special
meaning in the context of the search action.

I have the same problem with respect to underscores, but there is no
mention of underscores in the context of a reserved character.

Here is a copy of the option file that I used.

% cat option.txt
w0
z1000000000
t1800000
v1
k"\alnum{2,99}"
k"[\alnum@\.]{2,99}"
k"[\alnum\.]{2,99}"

When I tried to search the database that I had indexed using this option,
I got "Nothing was matched by your query" for the string "user@acme.com".
I got the same result when I did a search on the string "user" I then
reindexed, and tried various forms of qouting or escaping the characters
in my search strings, but to no avail.

Until I can find a workaround this is a serious limitation for what I
want to use the Webinator for.

As part support group for a software company that receives a high volume
of technical support queries via email, my aim is to provide a
web-searchable index to all members of the tech support group. Our
products run VMS and on UNIX platforms, and as such the our email
exchanges will focus on operating system terminology such as:

CONFIG_IMAGE_LIBRARY
CHARSET_OPTION_FILE
USER_PROFILE_DATABASE

We would like to have a search engine that will enable us to perform
searches on strings like the one above as well as, searches that locate
mail messages that were sent from user@acme.com to barney@slate.bedrock.com
with respect to "New parts for the quartzite_pulverizers".

I am most interested in purchasing the Webinator. It meets my requirements
with respect to user interface,search flexibility, ease of implementation
and database creation. However, the exceptions noted above prevent me
from doing so right now. I would consider purchasing Webinator if an
enhancemnet or simple workaround were provided.

Regards
Anthony,
:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:
|J. Anthony Waldron | Anthony.Waldron@innosoft.com |
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
|Innosoft International, Inc | Telephone: (818)919-3600 |
|1050 East Garvey Avenue South | FAX: (818)919-3614 |
|West Covina, California 91790 | URL: http://www.innosoft.com|
:t~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Use of the -k"expr" option.

Post by Thunderstone »




The @ sign is only special as the first character of a term.


Add underscore the above expressions as desired.


The quotes should not be used in an option file. They are to prevent
the shell from processing the special characters. Also make sure there
are no extraneous spaces or tabs at the end of the lines.
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Use of the -k"expr" option.

Post by Thunderstone »



On Fri, 14 Jun 1996, Mark Willson wrote:


You are correct, I can search for strings with the @ signs and
underscores. Thank you for your help.

Please send me information on how to purchase the commercial version.


Regards
Anthony,
:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:
|J. Anthony Waldron | Anthony.Waldron@innosoft.com |
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
|Innosoft International, Inc | Telephone: (818)919-3600 |
|1050 East Garvey Avenue South | FAX: (818)919-3614 |
|West Covina, California 91790 | URL: http://www.innosoft.com|
:t~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:


Post Reply