html in search results

rgwin0
Posts: 15
Joined: Fri May 04, 2007 1:45 pm

html in search results

Post by rgwin0 »

I am getting unwanted html tags displayed in my search results. For example, note the div tags in the first couple hits here:

http://search.dvcotechnology.com/cgi-bi ... h_spherion

You can see I've modified the overall search display, so perhaps this is something I screwed up. But all I really did was remove some of the extra formatting and links from the results page. I didn't knowingly alter the actual result strings, and I'm pretty lost figuring out where to track this down. Any suggestions?

thanks,
rob
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

html in search results

Post by mark »

Looks like you're somehow displaying original html in the abstract instead of the extracted text. Did you change wha t the walk stores in the Body field?

Try an unmodified search script (call it something else so it doesn't collide with your live one). How does it look there?
rgwin0
Posts: 15
Joined: Fri May 04, 2007 1:45 pm

html in search results

Post by rgwin0 »

I don't recall altering the Body field, where would I check that?

Testing with the original unmodified search script yields the same thing:

http://search.dvcotechnology.com/cgi-bi ... h_spherion

Are the html tags supposed to be stripped out on data storage or display?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

html in search results

Post by John »

They are removed on storage, only the text from the page should be stored.
John Turnbull
Thunderstone Software
rgwin0
Posts: 15
Joined: Fri May 04, 2007 1:45 pm

html in search results

Post by rgwin0 »

Okay, so I'm fairly certain I haven't modified the crawling scripts, I wouldn't even know how to do that. So any idea how I can track down the problem?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

html in search results

Post by mark »

It appears somehow related to using "Keep tags". Not sure exactly what's happening yet though...
rgwin0
Posts: 15
Joined: Fri May 04, 2007 1:45 pm

html in search results

Post by rgwin0 »

My Keep Tags are "search-start" and "search-end", each surrounded in html comment tags. Viewing the source of the first hit from my example, I find the search-start tag, two div tags that aren't included in the results, two that are included, and then my search string. This makes me wonder if the need to have n characters of text preceding the search string is somehow overriding the need to strip html? Possible??

(opening tag brackets removed so as to not confuse the message board)

search-start-->
div id="wd_printable_content">
div class="wd_newsfeed_releases-detail">
div class="wd_news_releases-detail">
div class="detail_header">Spherion Survey
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

html in search results

Post by John »

You should make sure the tag includes the full text of the tag, e.g. <!--search-start-->, otherwise it will start in the middle of a tag, which can throw things off.
John Turnbull
Thunderstone Software
rgwin0
Posts: 15
Joined: Fri May 04, 2007 1:45 pm

html in search results

Post by rgwin0 »

Yeah. Okay looks like my worries about posting html tags here were unfounded. So, my keep tags are:


<!--search-start-->
<!--search-end-->


The html of the original page looks like:


<!--search-start-->
<div id="wd_printable_content">
<div class="wd_newsfeed_releases-detail">
<div class="wd_news_releases-detail">
<div class="detail_header">Spherion Survey: Industrial, Manufacturing


...and a search for "survey industrial" returns:


<div class="wd_news_releases-detail"> <div class="detail_header">Spherion
Survey: Industrial, Manufacturing...


Bizarre, huh?
rgwin0
Posts: 15
Joined: Fri May 04, 2007 1:45 pm

html in search results

Post by rgwin0 »

I just realized I still wasn't too clear: The keep tags are indeed entered with the surrounding tags in the walk settings.
Post Reply