Strip Small Words

wcpriest
Posts: 14
Joined: Sat May 26, 2001 12:59 pm

Strip Small Words

Post by wcpriest »

John,

"Best" -- interestingly vague -- but however it does it, we will look forward to
"mentor-matcher" where we use textomm to compare the learner viewed web page
against the database of mentor subject talent. And, we will use textomm, based on
what you said, to pull "key" words out of Encarta for each "professed" area of
mentor expertise.

Bart suggests we can use <sandr> to apply just the "noise list" -- and if you could
give us the syntax of that, we would be interested in trying that too.

Thanks again.

You guys provide great support!

Regards,

Dr. Priest
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Strip Small Words

Post by John »

You would build up a list of expressions that match the words. You could either use Kai's expressions as a template, which require spaces surrounding the word, and replace the \alpha{1,2} with the actual word, or you could create an expression which treats a non-letter as being the end of word, e.g.

<$srch="[^\alpha]\P=a=[^\alpha]\F="
"[^\alpha]\P=about=[^\alpha]\F="
...>

<sandr $srch "" $text>

which will replace all the words defined in $srch with an empty string.
John Turnbull
Thunderstone Software
Post Reply