Searching for punctuation

Post Reply
jswartz
Posts: 4
Joined: Mon Oct 25, 2004 2:27 pm

Searching for punctuation

Post by jswartz »

We have recently added punctuation to our list of indexed characters. My engineering department tells me that this is the indexing call that we are using:

tsql -q "set keepnoise='on';set delexp=0;set addexp='\alnum{1,99}';set addexp='>>\alpha{1,50},=\alpha{1,50}';set addexp='[\alnum\punct]{1,30}';set addexp='\punct{1,5}';create metamorph inverted index idxmtblnew_CLEANUP on tblnew(CLEANUP);"

What is the syntax necessary to search for punctuation characters that may have other meaning in Texis? For instance, how do you search for the following: *, ", ', -, (, ), etc.? (For instance, it appears that you must enter an asterisk [*] twice in order to literally search for the asterisk character.)

Here is the complete list of those characters we want to be able to search for:

! Exclamation mark
¡ Inverted exclamation mark
? Question mark
# Pound Sign
- Dash
> Greater Than
< Less Than
; Semicolon
: Colon
( Right Parenthesis
) Left Parenthesis
[ Left Bracket
] Right Bracket
{ Left Curly Bracket
} Right Curly Bracket
' Apostrophe
. Period
$ Dollar Sign
@ At sign
& Ampersand
* Asterisk
" Double Quote

By the way, I am a technical writer here at CaseCentral, and I have been asked to research this issue while one of our key programmers is out of the office. Let me know if there is a better way for me to interact with Thunderstone to get this information. Thanks.

- jswartz@casecentral.com
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Searching for punctuation

Post by mark »

The Metamorph language used in the like/likep query has several reserved prefixes:
/ # % @ w/ ( ~
They only have meaning at the beginning of a term. There are different ways to search for those prefixes literally. The simplest is to put a backslash in front of those characters
\@home
\w/sheet
\(abc)
\~mine
\#24

Sentence enders period, comma, and question are always stripped from the end of the query. You can put the term in parens or place a noise term that will get stripped as the last term (assuming you haven't turned on keepnoise).
(termination.)
termination. a

* is a wildcard anywhere in a term. To remove the special meaning of * you would have to put the term in a paren list with another non-matching term like
(abc*,qjxz)

Comma and hyphen will separate terms just like space
abc,def
is the same as
abc-def
is the same as
abc def
To force it to be treated single term put quotes around the whole term
"abc,def"
"abc-def"

Hyphen continues to have meaning at search time unless you
<sql "set hyphenphrase=0"></sql>
first and include the hyphenated term in a paren list with another non-matching term like
(abc-def,qjxz)
jswartz
Posts: 4
Joined: Mon Oct 25, 2004 2:27 pm

Searching for punctuation

Post by jswartz »

Yes, I believe we are setting hyphenphrase=0.

So, for hyphens and asterisks, you have to include them in a paren list with *another non-matching term*? Can you not simply search for the following:

(abc*)

Or this:

(abc-def)

Also, are there special syntax rules in searching for any of the following:

! Exclamation mark
¡ Inverted exclamation mark
? Question mark
> Greater Than
< Less Than
; Semicolon
: Colon
( Right Parenthesis
) Left Parenthesis
[ Left Bracket
] Right Bracket
{ Left Curly Bracket
} Right Curly Bracket
. Period
$ Dollar Sign
& Ampersand
" Double Quote
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Searching for punctuation

Post by mark »

(abc*) won't work. Since it's a one item list it will be treated as abc* .

. , ? ( were addressed above.

For " use \" .

The rest should work as-is.
jswartz
Posts: 4
Joined: Mon Oct 25, 2004 2:27 pm

Searching for punctuation

Post by jswartz »

Are you certain that the following should not work "as is":

( Right Parenthesis
) Left Parenthesis
" Double Quote

I would think that you would need to use some special syntax for these?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Searching for punctuation

Post by mark »

Please reread the previous message.
jswartz
Posts: 4
Joined: Mon Oct 25, 2004 2:27 pm

Searching for punctuation

Post by jswartz »

Can you tell me what specific characters are considered "punct"? Are they the following:

! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ]
^ { | } ~ $

What about right and left apostrophes and double quotes?

Thanks.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Searching for punctuation

Post by John »

The definition of punct varies with the operating system, but will generally include everything that displays and is not a space. It generally is limited to the ASCII range, which does include the back-quote `, but does not include the left and right single and double quotes ‘ ’ “ ”
John Turnbull
Thunderstone Software
Post Reply