search engine works fine but...

Post by **Thunderstone** » Wed Mar 18, 1998 1:28 pm

I have a question from a user, and I patsed it here .

---

Most of our documents are in pdf-format, the webinator finds
everything in it but in the result-list he just mentions for each
found pdf-doc as header "PDF document" (and not the name of the
document) and the abstract is not meaningfull (it's seems to me just a
random text out of the pdf-document)

Are there possibilities in the webinator to show the name, subject,
keywords of the founded pdf-documents in the result-list, I looked on
the thunderstone-site (and reference-customers of webinator like
Ernst&Young) for this but I didn't find anything

---

Thnaks
Mussa Khiar
mkhiar@raychem.com

Post by **Thunderstone** » Thu Mar 19, 1998 11:30 am

If you switch to the new interface that came with webinator 2 you will get
the document name along with "PDF document" for the title.

Change the action of your search form from:
/cgi-bin/webinator
to:
/cgi-bin/texis/webinator/search/

The abstract is taken from the first significant paragraph of the document.
What you get depends on the document content. If you want the very first
text of the document instead you can modify the search script slightly.
Change the line that reads:
<abstract $Body $anum smart>$'ret'...
to:
<abstract $Body $anum dumb>$'ret'...

As an aside, your top.html file should not contain </BODY> or </HTML> tags.
And your bottom.html file should not contain <HTML>, <HEAD>, </HEAD>, or
<BODY> tags.

Post by **Thunderstone** » Thu Mar 19, 1998 3:56 pm

Mark Willson wrote:

We have the same problem. The only occurance of the string "<abstract"
in the scripts we've been given is at:

<DD><abstract $Body 180>$'ret'...<BR>
<CITE><A HREF="$Url">$Url</A><FONT SIZE=-1>

It looks like this is the same script that's executed regardless of
whether or not the subject file is a PDF. Should there be an IF test and
a modified script for PDF's?

Also at this point in our script, the title has already been sent - it
was the line before those two lines:
<DT><A HREF="$Url"><STRONG>$Title</STRONG></A>

so abstracting the $body differently wouldn't affect the $title. It
appears that the PDF plug-in is generating the phony title (with
filesize included).

What's going on?

Dave

--
--David E. Scott Ohio Administrative Services
DaveScott@1000islands.com acq_scott@ohio.gov

Post by **Thunderstone** » Thu Mar 19, 1998 9:46 pm

You are using the "alta vista" style interface script. I was referring to
the standard interface script. See the vortex documentation at
http://www.thunderstone.com/vortexman/ for details of how
to use the abstract function.

As for the titles... The pdf plugin does generate a title of
"PDF document (##k)" because pdf documents don't have titles.
The standard search script displays the filename along with the
title for pdf documents. See the standard search script at
http://www.thunderstone.com/texis/demos ... tor/search
for how this is done. See the function "matchline" where it displays
the title.