Webinator Quick Reference Guide by Slugwater

slugwater · Post by **slugwater** » Mon Nov 26, 2001 10:49 am

I have compiled this webinator document over the past year and find it quite handy for many general webinator commands. Hope it can help you!!

-Slugwater

##########################################################################################################
# #
# This is a Document with all the commands to built a Database with ThunderStone's Webinator #
# By Ian K. Millard -- March 1 2001 -- Ovid Technologies/SilverPlatter Information (IKM Design) #

# -- Revised November 26 2001 #
##########################################################################################################

THE BASICS
----------
./bin/gw
This runs gw, which is webinators walking tool. Look for the "bin" directory in your webinator installation. Within the bin

directory is the gw program, which is what does most of Webinator's work.

-d/home/httpd/htdocs/search/db
This line always goes after gw. The -d flag lets gw know that you are defining the database.

All together now!!
.bin/gw -d/home/httpd/htdocs/yourPath/dbName
This is the command line that will start most of your webinator commands.

CREATE A DATABASE
-----------------
gw -d/httpd/htdocs/search/BrandSpankinNewDB -create

WALKING A DATABASE (3 steps)
------------------
1. gw -d/httpd/webinator/olddb -wipe

2. gw -d/httpd/webinator/olddb -jhttp://www.mysite.com/subdir http://www.mysite.com/subdir
## Note: -j --> include everthing in this path ##

3. gw -d/httpd/webinator/olddb -index

CHECK TO SEE WHATS IN A Database (This will display all Urls in the database)
------------------------
./bin/gw -d/home/httpd/htdocs/search/cat2 -st "select Url from html"

CHECK FOR SPECIFIC PAGE in a DB
-------------------------------
/home/httpd/htdocs/search/bin/gw -d/home/httpd/htdocs/search/cat2 -s "select * from html where

Url='www.yourSite.com/subdir/thisPage.htm'"

REMOVE SPECIFIC PAGE FROM DB (2 Steps:must execute both statements)
-------------------------------------------------------------------
./bin/gw -d/home/httpd/htdocs/dbPath/dbName -s "delete from html where Url='www.mysite.com/junk.html'"
./bin/gw -d/home/httpd/htdocs/dbPath/dbName -s "delete from refs where Url='www.mysite.com/junk.html'"
.bin/gw -index
## Note: must remove from html and refs ##

REMOVING MULTIPLE PAGES IN THE SAME TREE
----------------------------------------
.bin/gw -s "delete from html where Url like '/www.mysite.com/testdir'"
.bin/gw -s "delete from refs where Url like '/www.mysite.com/testdir'"
.bin/gw -index
## Note: must remove from html and refs ##

REMOVING PAGES WITH REGULAR EXPRESSIONS
---------------------------------------
To remove all pages that have ?TEMPLATE=blank in them you would use:

.bin/gw -s "delete from html where Url like '/\?TEMPLATE\=blank'"
or
.bin/gw -s "delete from html where Url matches '%?TEMPLATE=blank'"

## Note: you must delete from html and refs. ##
## Ex --> gw -s "delete from html where Url matches '%?TEMPLATE=blank'" ##
## Ex --> gw -s "delete from refs where Url matches '%?TEMPLATE=blank'" ##

ADDING ONE PAGE TO A DATABASE
-----------------------------
Get a single page (-g)
Syntax: -g

Get just the single page specified by URL and quit. Nothing in the todo list will be processed. This is a quick way to get a

single page into the database without having to process the potentially large todo list.

/home/httpd/htdocs/yourPath/bin/gw -d/home/httpd/htdocs/yourPath/dbName -g http://xxx.com/yyy.htm

Checking for WEB Server Errors
------------------------------

Simple Check
gw -st "select Url,Reason from error" |more

Advanced Check
gw -s "select refs.Url Page,error.Url Error,error.Reason
from error,refs
where refs.Ref=error.Url"

IMPORTANT FLAGS for walking a db (basically your options)
---------------------------------------------------------
-d --> set database name
-j --> include everthing in this path
-x --> exclude everything in this path
-h --> show help
-O --> don't save settings
-v --> set verbosity (-v0 -v1 -v2 -v3)
-o --> grab off site pages
-a --> don't add to todo list
-f --> allow file extention (-fshtml -fasp)
-F --> don't allow file extension
-y --> allow ALL file types
-n --> define plug-in
-T --> add allowed MIME type
-S --> delete allowd MIME type
-N --> don't include alt text from images
-R --> don't store refs
-t --> set page timeout
-D --> limit depth of walk
-p --> limit retrived pages
-z --> limit retrieved page size
-g --> get a single page
-e --> reload pages
-X --> delete missing pages
-V --> only download modified pages
-c --> copy web site
-r --> ignore robots.txt
-C --> allow cgi-bin paths

-create --> create a new database
-wipe --> wipe databse
-wipetodo --> wipe todo list
-index or -i --> index your results
-rewalk --> rewalk using last walk perameters
-meta --> index meta data

------------------------------------------------------------

Document compiled by Ian Millard
http://www.IKMDesign.com
http://www.Silverplatter.com

Thanks to everyone at the Thunderstone message board their help over the years!!