Strip Small Words

wcpriest · Post by **wcpriest** » Thu Jul 05, 2001 5:31 pm

John,

"Best" -- interestingly vague -- but however it does it, we will look forward to
"mentor-matcher" where we use textomm to compare the learner viewed web page
against the database of mentor subject talent. And, we will use textomm, based on
what you said, to pull "key" words out of Encarta for each "professed" area of
mentor expertise.

Bart suggests we can use <sandr> to apply just the "noise list" -- and if you could
give us the syntax of that, we would be interested in trying that too.

Thanks again.

You guys provide great support!

Regards,

Dr. Priest

Post by **John** » Thu Jul 05, 2001 5:52 pm

You would build up a list of expressions that match the words. You could either use Kai's expressions as a template, which require spaces surrounding the word, and replace the \alpha{1,2} with the actual word, or you could create an expression which treats a non-letter as being the end of word, e.g.

<$srch="[^\alpha]\P=a=[^\alpha]\F="
"[^\alpha]\P=about=[^\alpha]\F="
...>

<sandr $srch "" $text>

which will replace all the words defined in $srch with an empty string.