literal searches including punctuation and noise words

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

literal searches including punctuation and noise words

Post by Thunderstone »



We have indexed the local history archives for our public library. In
testing, library staff have gone looking for various records they know are
in the database but haven't been able to pull them up, for example:

"303 S. WRIGHT STREET CORPORATION (OWNER)"
"4-H Club"
"Interstate Highway I-57"
"A and W Restaurant"

I think I understand how to reindex with the -k parameter in order to
include the various punctuation marks, but don't know how to do this most
efficiently without having to specify each character individually. Would
this do it?

-k"\alnum{1,30}" -k"[\alnum\punct]{2,30}"

I am at a loss on how to allow literal searches that include single letters
and noise words like "and". This is not a high priority, so if it's too
complicated, they'll just have to manage without.

Thanks!

-- Karen



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

literal searches including punctuation and noise words

Post by Thunderstone »



That could work (\punct covers all of the punctuation characters), but
you really only need
-k"\alnum{1,30}"
and to put
<apicp qminwordlen 1>
into your search script.

It would be better if users simply didn't enter extraneous periods and commas.
"303 S WRIGHT STREET CORPORATION (OWNER)"

If you wanted to, you could filter them before passing the query to SQL.
<sandr "[.,!]" " " $q><$q=$ret>

You should leave the dashes alone as they provide phrase binding.




Post Reply