Meta Search Script

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Meta Search Script

Post by Thunderstone »




Hello,
I have been trying to figure out how to modify the meta search script so
that it returns the results from my engine first everytime.
I can not for the life of me figure it out. I am hoping that someone can
give me an example.

Here is my current script.

NetWORLD is us.

Thanks for having a look.

Steve

<script language=vortex>
<timeout=50></timeout>

<!--------------------------------------------------------------------------
--
Install Notes:

You must either have Texis 2.5+ or Webinator 2.5+ installed to run this.
Webinator may be downloaded from http://www.thunderstone.com/webinator/

1: Save this file under the filename "meta" in your ~htdocs/webinator
directory
and point your browser at
http://www.myserver.com/cgi-bin/texis/webinator/meta/

2: In the <main> function there's a tag that says <REMOVE_ME>. Do it.

3: If a Search engine changes its display format or you want to add
a new engine, you'll need to edit the variables in the <init> function.

4: There's a helper app at
http://www.thunderstone.com/texis/demos/metaparse/
that will aid in writing new parse expressions.


----------------------------------------------------------------------------
-->

<!----------- The main entry point of this
cript --------------------------->

<a name=main>
<html>
<head><title>Meta Search<if $q neq "">: $q</if></title></head>
<body bgcolor=#ffffff link=#0000A0 vlink=#0000A0 alink=#ff8080>
<init>
<searchform>
<if $q neq "">
<netsearch>
</if>
<hr>
<caveat>
</body>
</html>
</a>

<!----------- Do the Metasearch and show
esults ---------------------------->

<a name=netsearch>
<strfmt "%U" $q> <!-- URL-escape query -->
<sandr "[\?\#\{\}\+\\]" "\\\1" $ret> <!-- escape sandr replace
chars -->
<sandr "xyzzy" $ret $searchurl> <!-- put the query in the
URLs -->
<$fetchthis=$ret>
<$liveurls = $ret>
<$acturls = >
<$actbases = >
<$actimports = >
<loop $searchhost> <!-- build URL search list -->
<loop $searchname $liveurls $bases $imports>
<if $searchname eq $searchhost>
<$acturls = $acturls $liveurls>
<$actbases = $actbases $bases>
<$actimports = $actimports $imports>
</if>
</loop>
</loop>
<urlcp timeout 30> <!-- dont wait longer than N seconds for
results -->
<flush>
<fmtcp query "%mbH" $q>

<fetch PARALLEL $acturls $searchhost $actimports>
<sandr $removeme "" $ret>
<$html=$ret>
<dl></mm></sb>
<timport max=100 ROW $actimports $html> <!-- Parse and print
results -->
<p>
<mm>
<dt>
<send $Link><send $Title><fmt "</a>">
<dd><lower $Abstract><send $ret>
<dd><tt><rex '>>http://=[^>"\space]+' $Link>$ret</tt>
</mm>
</timport>
</dl>
<flush>

</sep>
<flush>
</fetch>
<CENTER><A
HREF="http://networld-images.adbureau.net/cgi ... k.exe/AREA
=NETWORLD"><IMG
SRC="http://networld-images.adbureau.net/cgi ... r.exe/AREA
=NETWORLD" border=0></A></CENTER>

</a>

<!-------------------- Warn about Copyright
violations ----------------------->

<a name=caveat>
<BR>
<p align="center" style="margin-top: 15; margin-bottom: 0"><font
face="Arial"><strong><small><small><a
href="http://www.networld.com/channels/about/">Advertising Information</a> |
<a
href="http://www.networld.com/channels/about/">About NetWORLD</a> | <a
href="http://www.networld.com/channels/legal/">Legal Information</a> | <a
href="https://shop.networld.com/cdorder.htm"><font color="#FF0000">Order A
Free NetWORLD
CD!</font></a></small></small></strong></font></p>

<p align="center" style="margin-top: 0; margin-bottom: 0"><font
face="Arial"><small><small>Copyright
© 1996, 1997, 1998 <a
href="http://www.networld.com/channels/about/">NetWORLD
Connections, Inc.</a> All Rights Reserved. (888)
627-9753</small></small></font></p><BR>
<CENTER>
<font size=-5 face=helvetica><i>Notice:</i>
Some search results are provided by the following search engines.
Inktomi.
</font>
</CENTER>
</a>

<!--------------------- Display the search
form ------------------------------>

<a name=searchform>
<if $searchhost eq "">
<$searchhost=$searchname>
</if>

<CENTER><!-- Beginning of CSIM -->
<IMG SRC="http://www2.networld.com/images/navbarfinal2.gif"
USEMAP="#navbarfinal2" BORDER=0>
<MAP NAME="navbarfinal2">
<AREA SHAPE=RECT COORDS="26,18,75,74"
HREF="http://www.networld.com/channels/about/" ALT="Corporate Information"
TARGET="_top" OnMouseOut="window.status=''; return true"
OnMouseOver="window.status='NetWORLD Info'; return true">
<AREA SHAPE=RECT COORDS="75,17,126,74"
HREF="http://mail.networld.com/index.cgi?lang=eng&tnum=2" ALT="NetWORLD
E-Mail" TARGET="_top" OnMouseOut="window.status=''; return true"
OnMouseOver="window.status='E-mail'; return true">
<AREA SHAPE=RECT COORDS="125,16,173,75"
HREF="http://chat.networld.com:4080/chat/worl ... login.html"
ALT="NetWORLD Chat" TARGET="_top" OnMouseOut="window.status=''; return
true" OnMouseOver="window.status='Chat'; return true">
<AREA SHAPE=RECT COORDS="173,16,221,74"
HREF="http://www.networld.com/channels/support/" ALT="NetWORLD Support"
TARGET="_top" OnMouseOut="window.status=''; return true"
OnMouseOver="window.status='Support'; return true">
<AREA SHAPE=RECT COORDS="221,0,363,76" HREF="http://www.networld.com"
ALT="NetWORLD Main Page" TARGET="_top" OnMouseOut="window.status='';
return true" OnMouseOver="window.status='NetWORLD'; return true">
<AREA SHAPE=RECT COORDS="362,20,409,74" HREF="http://home.networld.com"
ALT="Personalize" TARGET="_top" OnMouseOut="window.status=''; return true"
OnMouseOver="window.status='Personalize'; return true">
<AREA SHAPE=RECT COORDS="408,19,458,74"
HREF="http://www2.networld.com/news/" ALT="Online News" TARGET="_top"
OnMouseOut="window.status=''; return true"
OnMouseOver="window.status='News'; return true">
<AREA SHAPE=RECT COORDS="458,19,505,73"
HREF="http://www.networld.com/channels/shopping/" ALT="NetWORLD Shopping"
TARGET="_top" OnMouseOut="window.status=''; return true"
OnMouseOver="window.status='Shopping'; return true">
<AREA SHAPE=RECT COORDS="504,17,558,74" HREF="http://auction.networld.com"
ALT="NetWORLD Auctions" TARGET="_top" OnMouseOut="window.status=''; return
true" OnMouseOver="window.status='Auctions'; return true">
<AREA SHAPE=default TARGET="_top" HREF="http://www.networld.com">
</MAP>
<!-- End of CSIM -->
</CENTER>
<table width="100%" border="0" cellpadding="0" cellspacing="0"
bgcolor="#DEDECA"><tr><td bgcolor="#000000" colspan="3" height="1"><spacer
type="block" width="1" height="1"></td></tr>
<TR>
<TD vAlign=top bgcolor="#ffffff" colspan=3><FONT
face=Tahoma,Verdana,Arial,Helvetica size=2><A
href="http://www.networld.com/">Main</A> <B>/ </B><A
href="javascript:history.go(-1);">Back</A><B> / </B>Search Results<if $q neq
"">: [$q]</if></FONT></TD></TR>
<tr><td bgcolor="#999999" colspan="3" height="1"><spacer type="block"
width="1" height="1"></td></tr><tr><td bgcolor="#DEDECA"
colspan="3">&nbsp;</td></tr><tr><td width="40%" rowspan="3">&nbsp;</td><td
nowrap><font face="Helvetica,Arial" size="3">

<form method=post action=$url/main.html>
<b>Search</b></font>
<input name=q value="$q" size=20>&nbsp;<input type=submit value="go">
</td><td width="40%" rowspan="3">&nbsp;</td></tr><tr><td valign="top">
</td></tr><tr><td nowrap><font face="Verdana,Helvetica,Arial" size="1">
</font><br><img src="/images/clear.gif" height="4"
width="1"></td></tr></form>
</table>
<img src="/images/clear.gif" border="0" width="1" height="6"><br>
<center><table width="100%" cellpadding="0" cellspacing="0"
border="0"><tr><td align="center" valign="top"><table align="left"
width="121" height="60" cellpadding="1" cellspacing="0" border="0">
<tr><td bgcolor="#000000" valign="top">
<table width="200" cellpadding="0" cellspacing="0" border="0">
<tr><td bgcolor="#FFFFFF" align="center" valign="center" width="200"
NOWRAP><a
href="http://barnesandnoble.bfast.com/booklin ... 0&category
id=searchby&choice=keywordSearch&userInput=$q"><img
src="http://www.networld.com/images/barnessm.gif" border="0"></a>
<font face="Verdana,Arial,Helvetica" size="-1"><b><a
href="http://barnesandnoble.bfast.com/booklin ... 0&category
id=searchby&choice=keywordSearch&userInput=$q">Books Related to
$q</a></b></font>
</td></tr>
</table>
</td></tr>
</table>
</td><td>&nbsp;&nbsp;</td><td valign="top" align="center" colspan="3"><A
HREF="http://networld-images.adbureau.net/cgi ... k.exe/AREA
=NETWORLD"><IMG
SRC="http://networld-images.adbureau.net/cgi ... r.exe/AREA
=NETWORLD" border=0><br><FONT FACE="verdana,arial,helvetica"
SIZE="1"><b>Click here</b></FONT></a></td></tr></table></center><br>
<table width="100%" border="0" cellpadding="0" cellspacing="0"
bgcolor="#003366">
<tr><td valign="middle" align="left" nowrap colspan="3"><font
face="Helvetica,Arial" size="3" color="#FFFFFF"><a
name="search">&nbsp;<b>Web search results</b>&nbsp;&nbsp;&nbsp;&nbsp;<font
face="Helvetica,Arial" size="2"> results most relevant to <b>$q</b>
</font>&nbsp;</a></font></td>
<td height="18" valign="middle" align="right" nowrap colspan="1">&nbsp;
</td></tr>
<tr><td valign="middle" align="left" nowrap colspan="4"
bgcolor="#000000"><img src="/images/clear.gif" border="0" width="100%"
height="1"></td>
</table>
</a>

<!--------------------------------------------------------------------------
---
<Init> sets up the list of available engines and their affiliated parsers.
If the format of the result set changes for a engine, the TIMPORT
specification for that engine will need to be replaced.
----------------------------------------------------------------------------
-->


<a name=init> <!-- This sets up all of the lists we need
later on -->

<$removeme= "\x0d" "<b>" "</b>"> <!-- List of things to remove before
parsing -->

<$searchurl =
"http://www2.networld.com/cgi-bin/search ... erms=xyzzy"
"http://search.thunderstone.com/texis/we ... 0&w3meta=1"
"http://ink.yahoo.com/bin/query?p=xyzzy"
>
<$searchname =
"NetWORLD"
"Thunderstone"
"Inktomi"
>
<$bases =
"http://www2.networld.com"
"http://search.thunderstone.com/"
"http://ink.yahoo.com/bin/query/"
>


<!-- See TIMPORT DOCUMENTATION for more details on how the imports
work -->
<!-- See REX DOCUMENTATION for more on our regular expression syntax -->
<$imports=
'#NetWORLD
multiple
recexpr >><LI>=\x0a=<a=!href\=+href\=="?[^"
a=!href\=+href\=="?[^" >]+[^>]*>=>><img=!src\=+src\=="?[^"
# 1 2 3 4 5 6 7 8 9 0 1 2
3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
3 4 5 6 7 8 9 0 1 2 3 4 5
6 7 8 9
field Link varchar(40) 3-9
field Title varchar(80) 10
field Abstract varchar(180) 13
'

'#Thunderstone
multiple
recexpr >><dt>=[^<\x0a]+<a
=[^>]+>=[^<\x0a]+</a><dd>=[^<\x0a]+<tt>=[^<\x0a]+</tt><br><i>=[^<\x0a]+</i><
br><small><a =[^>]+>=[^<\x0a]+</a>=[^<\x0a]+</small><p>\x0a
# 1 2 3 4 5 6 7 8 9 0
1 2 3 4 5 6 7 8 9
# Name Type Tag
field Link varchar(40) 3-5
field Title varchar(80) 6
field Abstract varchar(180) 8
'

'#Inktomi
multiple
recexpr >><LI>=<a href=[^>]+>=[^<]+</a>=[^\-]+\-=[^<]+
# 1 2 3 4 5 6 7 8 9
# Name Type Tag
field Link varchar(40) 2-4
field Title varchar(80) 5
field Abstract varchar(180) 9
'



</a>
<!-- End of the <init> function -->

</script>




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Meta Search Script

Post by Thunderstone »



Quoting from the tech support archive at
http://www.thunderstone.com/texis/webinator/listproc/

"
You can do a separate fetch of just your Url before the parallel fetch
of the others. You would probably want to move the code inside the
<fetch></fetch> to a function of it's own so you can call it from both
fetch's instead of replicating it all.

Or, if your output is already properly formatted for inclusion with the rest,
you could just fetch it and display it directly with <send>.
"



Post Reply