Understading query

Post Reply
dao
Posts: 31
Joined: Fri Apr 12, 2002 2:26 pm

Understading query

Post by dao »

Hi,

The online manual for Webinator gives the following information that I am confused with. Can you help clarify?

the manual states that for the following queries, webinator will locate the following results:

query1: john will locate john, John
query2: "john public" will locate John Public
query3: web-browser will locate Web browser, web-browser
query4: John*Public will locate John Q. Public, John Public
query5: 456*a*def will locate 1-23456-789-ABCDEF
query6: activate will locate activate, activation, activated, ...

What I don't understand is the following:

1) what does query 1 locate not locate things like "johnson" but query 5 does locate the querystring being in the middle of string even if the querystring is not followed by a wild card?

2) similarly, why does query 6 locate things like "activation"?

3) why does query 5 not locate only results begin with "456"

Thanks
dao@mit.edu
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Understading query

Post by mark »

It's not about substrings. You're forgetting the word form processing also mentioned in the manual. With word forms on, "activation" is seen as a valid english derivation of "activate". Johnson is not a valid variation of John (though you could customize the suffix list and minwordlen to make it so).

The query 5 example is an anachronism from earlier versions that needs to be corrected in the manual. Wildcards at the end or middle of a term will anchor to the beginning of words.

456*a*def will locate 1-456-789-ABCDEF
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Understading query

Post by mark »

p.s. That last example with wildcards requires "post processing" to fully handle. Post processing is off in the search script by default. In that case "456*a*def" will cause a warning and be treated as "456*".
dao
Posts: 31
Joined: Fri Apr 12, 2002 2:26 pm

Understading query

Post by dao »

Hi,
Thanks for the reply. When you say "word form", you mean the 250,000 word thesaurus, right?

Finally, I not sure what you mean by "Wildcards at the end or middle of a term will anchor to the beginning of words".

By anchor, do you mean that

"456*a*def" is equivalent to
"*456*a*def*"?

Thanks

dao@mit.edu
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Understading query

Post by mark »

No, word form means morphemic derivitations using suffix stripping. Things from he thesaurus are called synonyms.

No, I'm saying that "456*" is NOT equivalent to "*456*"?
dao
Posts: 31
Joined: Fri Apr 12, 2002 2:26 pm

Understading query

Post by dao »

Great,

I get word forms now. Regarding the anchoring stuff, if "Wildcards at the end or middle of a term will anchor to the beginning of words",

then

456*a*def will NOT locate 1-456-789-ABCDEF

correct? in your previous response, you indicate it WILL locate 1-456-789-ABCDEF, which seems to me to suggest no anchoring to beginning of words.

Thanks for the great patience,

dao@mit.edu
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Understading query

Post by mark »

The string
1-456-789-ABCDEF
contains 4 "words": 1, 456, 789, and ABCDEF
456* or 456*a opr 456*a*def will anchor to the beginning of the word 456 then apply the wildcards to locate the rest. Similarly 456*a*DEF will match 456789ABCDEF
Post Reply