hit markup is fussier than database

Post Reply
rjshelq
Posts: 75
Joined: Thu Nov 17, 2005 3:25 pm

hit markup is fussier than database

Post by rjshelq »

Hi,

I'm using Webinator 6.1, and I've noticed that many searches for phrases return hits which are not highlighted. It appears that the hit markup is fussier about punctuation and noise words than the database was in finding the hits.

I would like to find a way to highlight the match which the database found, even though the punctuation and noise words may be different than the original query.

That is, I'd like the hit markup to be "relaxed" to ignore punctuation and ignore noise words (or ignore short words of perhaps up to three characters occurring where the original query has a space).

Is there a way to have the hit markup be a bit less rigid?

If you don't already have a scheme to do that, could you suggest a way to allow each punctuation mark in the original query phrase to match 0 or more punctuation marks, and allow each space in the original query phrase to match any 5 characters (punctuation, spaces or letters) in the phrase returned from the database?
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

hit markup is fussier than database

Post by mark »

I assume you're talking about the search results list where it shows a short abstract for each result.

Are you using the default Abstract Style of Query?
Are the phrases you're having trouble with quoted
in the query? To highlight all words in the phrase you may want to strip the quotes for the hit markup.

Try this. In the search script where it displays the abstract change
<strfmt $qsrchfmt $txtquery>
<abstract $Body $SSc_abstractlen smart $ret><$dabstract=$ret>
to
<sandr '"' ' ' $txtquery>
<strfmt $qsrchfmt $ret>
<abstract $Body $SSc_abstractlen smart $ret><$dabstract=$ret>
rjshelq
Posts: 75
Joined: Thu Nov 17, 2005 3:25 pm

hit markup is fussier than database

Post by rjshelq »

Yes, I'm referring to the search results list. But, unfortunately the proposed solution of removing the quote marks results in highlighting way too many words.

Here's an actual example of results to a quoted phrase:

The original query was "love, harmony and beauty", and a number of articles which have that exact phrase were properly highlighted in the standard search results.

However, there were also a number of very slight variations which were apparently found in the database, so these additional articles also appear in the standard results, but they do not have any hit highlighting.

The additional articles which were included in the search results, but which did not include any highlighting, were "love, harmony, and beauty" (which has an additional comma) and "love, harmony, beauty" (which has an additional comma and is missing a noise term).

It appears that the underlying database search does not care about punctuation or noise words, but the hit markup requires an exact match including punctuation and noise words.

Therefore, I'd like to highlight the phrase which was found in the database, by simply ignoring punctuation and/or noise words which occur between the words of the original query.

Can you help me find a way to highlight all of the phrases which the database search correctly identified, despite minor differences in punctuation and/or noise words which occur between the words of the original query?
rjshelq
Posts: 75
Joined: Thu Nov 17, 2005 3:25 pm

hit markup is fussier than database

Post by rjshelq »

Any thoughts yet on the task of highlighting a phrase which the database search correctly found, even though it has minor variations in punctuation and/or noise words which the current smart abstract is not smart enough to highlight? (an example is given above, in previous posting)
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

hit markup is fussier than database

Post by mark »

My only thought so far is to do some ugly sandblasting to remove the punctuation and noise for highlighting then put it back.
Post Reply