questions

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

questions

Post by Thunderstone »



I've been trying out Webinator for the first time. A few
questions:

I tried to index consumerworld.org, but it was very slow. Why
is this? Do either of the following options slow things down?

domain=consumerworld.org
jhttp://www.consumerworld.org/

I get a lot of errors like this: "Can't get address for
'(url)': Unknown error". Do either of these options cause this
error?

"gw" keeps writing out two numbers. The first one is the number
of pages it's walks, the second number means what?

How do I rewalk a single site? When I try to do this, it just
tells me it already walked it, then stops.

When I try to use the "-rewalk" option, it wipes the database.
How do I prevent that? Am I doing something wrong?

Is there a way to configure an HTML search form in such a way
that you can search separate databases with one form submission?

Best Regards,

******************************************************************
David Snell iMarvel Supersite, www.imarvel.com
Webmaster 14332 NE 126th Ave., Suite #B-203
Ark Royal Software Kirkland, WA 98034-1542
davesnell@imarvel.com 425-825-0906
******************************************************************





Sent using iMarvel, "the Marvel of the Internet." Create your own iMarvel at http://www.imarvel.com!


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

questions

Post by Thunderstone »




Your nameserver is slow, or your connection is slow, or
www.consumerworld.org is slow, or your machine is slow.
You can use -L to cut down on name lookups. See the manual.


No, but -jhttp://www.consumerworld.org/ is redundant. gw inherently
stays on the specified site unless you tell to go off with something
like -domain or -o.


Those are caused by bad/dead urls on the pages you are walking.
Using -L, mentioned above, will generally eliminate those. See the manual.


The second is the total number of hyperlinks, of any type, seen.


There's no direct option for rewalking a single site in a database that
contains multiple sites. You would need to delete that site from
the database and walk it again:
gw -s "delete from html where Url matches 'www.mysite.com%'"
gw -s "delete from refs where Url matches 'www.mysite.com%'
gw http://www.mysite.com


and manual (http://www.thunderstone.com/gw25man/gw2.html)

-rewalk does fairly literally what it's name says. It takes the options
and starting URLs specified to gw by you previously. It then creates a
new empty database and does a complete walk using the extracted options
and starting URLs. You may not specify any other options or URLs when
using -rewalk. gw should not be run from within the database directory
when using this option

When the new walk completes successfully the old database is replaced
with the new one. The options and critical files such as top.html and
bottom.html are copied into the new database from the old. Otherwise,
the new database is a fresh directory. The existing gw.log file, query
log, and any non-database files will be lost.


You may edit the script to perform the query, switch databases, and
perform the query again (against the second database).
Example flow:

<sql "select ...">
display the hit
</sql>
<db=/usr/local/etc/httpd/htdocs/webinator/db2>
<sql "select ...">
display the hit
</sql>


Post Reply