Page 1 of 1

Searching for punctuation

Posted: Mon Oct 25, 2004 2:39 pm
by jswartz
We have recently added punctuation to our list of indexed characters. My engineering department tells me that this is the indexing call that we are using:

tsql -q "set keepnoise='on';set delexp=0;set addexp='\alnum{1,99}';set addexp='>>\alpha{1,50},=\alpha{1,50}';set addexp='[\alnum\punct]{1,30}';set addexp='\punct{1,5}';create metamorph inverted index idxmtblnew_CLEANUP on tblnew(CLEANUP);"

What is the syntax necessary to search for punctuation characters that may have other meaning in Texis? For instance, how do you search for the following: *, ", ', -, (, ), etc.? (For instance, it appears that you must enter an asterisk [*] twice in order to literally search for the asterisk character.)

Here is the complete list of those characters we want to be able to search for:

! Exclamation mark
¡ Inverted exclamation mark
? Question mark
# Pound Sign
- Dash
> Greater Than
< Less Than
; Semicolon
: Colon
( Right Parenthesis
) Left Parenthesis
[ Left Bracket
] Right Bracket
{ Left Curly Bracket
} Right Curly Bracket
' Apostrophe
. Period
$ Dollar Sign
@ At sign
& Ampersand
* Asterisk
" Double Quote

By the way, I am a technical writer here at CaseCentral, and I have been asked to research this issue while one of our key programmers is out of the office. Let me know if there is a better way for me to interact with Thunderstone to get this information. Thanks.

- jswartz@casecentral.com

Searching for punctuation

Posted: Mon Oct 25, 2004 3:19 pm
by mark
The Metamorph language used in the like/likep query has several reserved prefixes:
/ # % @ w/ ( ~
They only have meaning at the beginning of a term. There are different ways to search for those prefixes literally. The simplest is to put a backslash in front of those characters
\@home
\w/sheet
\(abc)
\~mine
\#24

Sentence enders period, comma, and question are always stripped from the end of the query. You can put the term in parens or place a noise term that will get stripped as the last term (assuming you haven't turned on keepnoise).
(termination.)
termination. a

* is a wildcard anywhere in a term. To remove the special meaning of * you would have to put the term in a paren list with another non-matching term like
(abc*,qjxz)

Comma and hyphen will separate terms just like space
abc,def
is the same as
abc-def
is the same as
abc def
To force it to be treated single term put quotes around the whole term
"abc,def"
"abc-def"

Hyphen continues to have meaning at search time unless you
<sql "set hyphenphrase=0"></sql>
first and include the hyphenated term in a paren list with another non-matching term like
(abc-def,qjxz)

Searching for punctuation

Posted: Mon Oct 25, 2004 4:36 pm
by jswartz
Yes, I believe we are setting hyphenphrase=0.

So, for hyphens and asterisks, you have to include them in a paren list with *another non-matching term*? Can you not simply search for the following:

(abc*)

Or this:

(abc-def)

Also, are there special syntax rules in searching for any of the following:

! Exclamation mark
¡ Inverted exclamation mark
? Question mark
> Greater Than
< Less Than
; Semicolon
: Colon
( Right Parenthesis
) Left Parenthesis
[ Left Bracket
] Right Bracket
{ Left Curly Bracket
} Right Curly Bracket
. Period
$ Dollar Sign
& Ampersand
" Double Quote

Searching for punctuation

Posted: Mon Oct 25, 2004 5:30 pm
by mark
(abc*) won't work. Since it's a one item list it will be treated as abc* .

. , ? ( were addressed above.

For " use \" .

The rest should work as-is.

Searching for punctuation

Posted: Mon Oct 25, 2004 6:04 pm
by jswartz
Are you certain that the following should not work "as is":

( Right Parenthesis
) Left Parenthesis
" Double Quote

I would think that you would need to use some special syntax for these?

Searching for punctuation

Posted: Mon Oct 25, 2004 6:46 pm
by mark
Please reread the previous message.

Searching for punctuation

Posted: Wed Oct 27, 2004 3:26 pm
by jswartz
Can you tell me what specific characters are considered "punct"? Are they the following:

! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ]
^ { | } ~ $

What about right and left apostrophes and double quotes?

Thanks.

Searching for punctuation

Posted: Wed Oct 27, 2004 5:39 pm
by John
The definition of punct varies with the operating system, but will generally include everything that displays and is not a space. It generally is limited to the ASCII range, which does include the back-quote `, but does not include the left and right single and double quotes ‘ ’ “ ”