6 Qs

Post by **Thunderstone** » Tue Apr 21, 1998 2:56 pm

1) The last thing the installation does is to instruct me
to index my web site with a command similar to:
(whatever)/gw http://www.mysite.com
and then to go ahead and access the index with my browser.

Section 0.4 of the webinator doc (dtd 1/15/98) says to
do "gw -index" after indexing but before using my browser.
(And if memory serves, I think "gw" also said to use "gw -index"
after it had indexed a site.)

I'm sure extra invocations of "gw -index" cause no harm, but still,
I'm curious to know if I really needed to do the "gw -index" bit
right after indexing a site into a new database?

2) I'm not understanding the difference between -noindex and -dropindex.
The doc says that -noindex drops search indices while -dropindex
drops all indices. I guess my confusion comes from the assumption
that all indices are there to be searched and so I'm fuzzy on the
difference between "search indices" and "all indices"...?

3) I can see how much trouble one might get themselves into with
the -C option. This name is configurable under many web servers
and indeed our CGI directories are called "user-cgi", not "cgi-bin"
using various Netscape servers. Can the -C option be made to work
with "user-cgi"?

4) I started indexing a site and then walked away. When I came back,
I was told the indexing had ended normally and the well-meaning person
logged me off. So I don't know how many pages were indexed.
Under Irix typed 'gw -d- -st "select Url from html" | wc -l' and
this seemed to give me a good page count. If there's more straight-
forward way of getting the number of pages in the index, I'm all ears.

5) http://www.thunderstone.com/webinator/help.html#qhelp says the
wild-card character can be used to match just a previous of a
word or to ignore the middle of something. So to do a bit of
stress-testing, I typed "ra*" with the Word Form left as "Exact"
and I expected to match anything that begins with "ra".
According to the View-Context buttons, it matched on words which
CONTAIN "ra" (travel, branch, contract, etc). Did I mis-understand
what the wild-card does?

5b) My Webster's dictionary has no entry for the word "asterix" as
it appears in the above-mentioned URL. Should be spelled "asterisk"?

6) Can the wild-card character be changed to something other than
the "*" (to be consistent with unrelated legacy mainframe databases
used by the same customer)?

-John Koch - - - __o
Knowledge Systems, Inc. - - - - _ \<,_
<John.ksi@webplus.net> - - (_)/ (_)
(A NET-FRIENDLY SIG. http://www.ncsa.uiuc.edu/Edu/ICG/pt1.ch2.Etiquette.html )

Post by **Thunderstone** » Tue Apr 21, 1998 3:48 pm

gw only says to use -index if you used -noindex or it didn't finish
the run. gw will automatically perform -index at the end of a run because
most people forget to. So you don't need to, but it doesn't hurt to do it.

I think you meant -unindex, not -noindex.
There are several indices on the data. Some are used during walking
and some are only used during searching.
-unindex only drops the ones that are only used by the search interface.

Use -x to exclude other unwanted directories.
See http://www.thunderstone.com/gw2man/node21.html

Good effort. This is more direct:
gw -d- -st "select count(Url) from html"
Or in the simplest case, look at the walk log in the database directory:
tail gw.log

* will also find substrings.

You can't change Texis to understand a different character, but you can
edit the user's query with the vortex <sandr> function before passing
it to sql. See http://www.thunderstone.com/vortexman/node63.html