[...some snipping...]
The Webinator's idea of a "Web-site" is synonymous with a single
IP address. This is because of all of the possible ways DNS can
resolve names in conjunction with the mixed bag of ways to write
a URL.
The Webinator has no notion of a path other than its understanding of
"../" and "./" relative references. This is the correct and intended
operation of the program. (URL's don't always mean protocol://host/pathname.)
Webinator is just following the linkages in order to ensure that it
it doesn't miss any referenced URLs on the server, and since its a
Web and not a tree it has no concept of up or down. (Trust me,
it would have been a whole lot easier to write if it was a tree.)
With all that said, here's a way you might try if you only
want to index single subdirectory on a server where you have
telnet permission.
First "cd" to the htdocs directory on the server.
Then:
find MYDIR -name "*.html" -follow -exec echo
http://www.abc.com/ {}\; | sed -e "s/ //" | gw -a -dMYDB "&-"
The above is all one line. MYDIR is the name of the directory you wish
to index, and MYDB id the name of the Webinator database directory
in which you'd like to place the index.
Note: This may index more than you intend because even documents which
are not exposed via hyperlinks will be located by the find command.
The only other way if you don't have telnet is to play with the -D and
-a options in conjunction with the "include/exclude" lists.
You can "hand-walk" a site by feeding lists to it from
a command like:
gw -d. -st "select Ref from refs where Ref like '/texis'" | gw -a -d. "-&"
or you could just delete the stuff that you don't want afterwards
with a command like:
gw -d. -st "delete from html where Url like '/wais'"
There's a whole lot of control and flexibility available when you
start to use the Texis SQL aspects of the Webinator along with
the Unix command line stuff.
Keep them coming!
Bart Richards