Page 1 of 2
html in search results
Posted: Fri May 04, 2007 2:02 pm
by rgwin0
I am getting unwanted html tags displayed in my search results. For example, note the div tags in the first couple hits here:
http://search.dvcotechnology.com/cgi-bi ... h_spherion
You can see I've modified the overall search display, so perhaps this is something I screwed up. But all I really did was remove some of the extra formatting and links from the results page. I didn't knowingly alter the actual result strings, and I'm pretty lost figuring out where to track this down. Any suggestions?
thanks,
rob
html in search results
Posted: Fri May 04, 2007 2:46 pm
by mark
Looks like you're somehow displaying original html in the abstract instead of the extracted text. Did you change wha t the walk stores in the Body field?
Try an unmodified search script (call it something else so it doesn't collide with your live one). How does it look there?
html in search results
Posted: Fri May 04, 2007 5:02 pm
by rgwin0
I don't recall altering the Body field, where would I check that?
Testing with the original unmodified search script yields the same thing:
http://search.dvcotechnology.com/cgi-bi ... h_spherion
Are the html tags supposed to be stripped out on data storage or display?
html in search results
Posted: Fri May 04, 2007 5:31 pm
by John
They are removed on storage, only the text from the page should be stored.
html in search results
Posted: Fri May 04, 2007 5:51 pm
by rgwin0
Okay, so I'm fairly certain I haven't modified the crawling scripts, I wouldn't even know how to do that. So any idea how I can track down the problem?
html in search results
Posted: Fri May 04, 2007 6:07 pm
by mark
It appears somehow related to using "Keep tags". Not sure exactly what's happening yet though...
html in search results
Posted: Fri May 04, 2007 7:01 pm
by rgwin0
My Keep Tags are "search-start" and "search-end", each surrounded in html comment tags. Viewing the source of the first hit from my example, I find the search-start tag, two div tags that aren't included in the results, two that are included, and then my search string. This makes me wonder if the need to have n characters of text preceding the search string is somehow overriding the need to strip html? Possible??
(opening tag brackets removed so as to not confuse the message board)
search-start-->
div id="wd_printable_content">
div class="wd_newsfeed_releases-detail">
div class="wd_news_releases-detail">
div class="detail_header">Spherion Survey
html in search results
Posted: Mon May 07, 2007 1:06 pm
by John
You should make sure the tag includes the full text of the tag, e.g. <!--search-start-->, otherwise it will start in the middle of a tag, which can throw things off.
html in search results
Posted: Mon May 07, 2007 1:20 pm
by rgwin0
Yeah. Okay looks like my worries about posting html tags here were unfounded. So, my keep tags are:
<!--search-start-->
<!--search-end-->
The html of the original page looks like:
<!--search-start-->
<div id="wd_printable_content">
<div class="wd_newsfeed_releases-detail">
<div class="wd_news_releases-detail">
<div class="detail_header">Spherion Survey: Industrial, Manufacturing
...and a search for "survey industrial" returns:
<div class="wd_news_releases-detail"> <div class="detail_header">Spherion
Survey: Industrial, Manufacturing...
Bizarre, huh?
html in search results
Posted: Mon May 07, 2007 1:22 pm
by rgwin0
I just realized I still wasn't too clear: The keep tags are indeed entered with the surrounding tags in the walk settings.