I am working with a set of html documents that quote email messages. A
typical document is shown below. How would one tell indexer to
consider strings of the form "user@somedomain.com" as a word? The
trouble is I am unsure how to classify the "@" when using the -k"expr"
option. Is it alphanumeric or something else? Ditto for strings with the
period as in "domain.com".
/*----------------------example-------------------------------------
<html>
<head><title>
Lorem ipsum dolor sit amet.
</title>
</head>
<h1><center>
Lorem ipsum dolor sit amet
</center></h1>
<a name ="headers">Msg Hdrs</a><hr>
<p>
<hr>
<a name = headers></a>Date: Thu, 16 Jan 1066
<br>
From: John Doe < doughboy@acme.com >
<br>
Subject: Lorem ipsum dolor sit amet.
<br>
To: user@mynode.mydomain.com
<br>
Cc:friend@other.domain.org,person@school.edu<br>
<pre>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy
nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Duis
autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et
accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit
augue duis dolore te feugait nulla facilisi.
</pre>
</html>
----------------------example----------------------------------------*/
Thanks in advance.
Regards
Anthony,
:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:
|J. Anthony Waldron | Anthony.Waldron@innosoft.com |
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
|Innosoft International, Inc | Telephone: (818)919-3600 |
|1050 East Garvey Avenue South | FAX: (818)919-3614 |
|West Covina, California 91790 | URL: http://www.innosoft.com|
:t~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~: