Page 1 of 1

Indexing virtual domains

Posted: Thu Nov 20, 1997 10:40 am
by Thunderstone


Greetings, all.
A follow-up to my question yesterday about having webinator index
virtual domains on my server that share the same IP address.

I run a collection of sites related to the oil and gas industry. I woud
like to use webinator to create an industry specific search engine
composed of the sites we host, those of our advertisers, and a few other
related sites of interest.

I used the following command:
/gw -i -unique "&url.lst"

Where url.lst is a text file w/ the following structure:
http://energyconnect.com/
http://emdaapg.org/
http://www.allegrodevelopment.com/
http://landman.org/
http://apachecorp.com/
http://archpetroleum.com/
http://artesia.com/
http://www.banksinfo.com/
http://gasearch.com/
http://www.dallasdata.com/
http://energyex.com/
http://dalylift.com/
http://gas-coop.com/
http://ipams.org/


The domains w/o the www in the URL are on my server. Webinator ignores
all but the first one, apparently based
on the IP address. I also tried leaving in the www, and had identical
results.

Is there any workaround, other than convincing my ISP to grant each of
these domains unique IP addresses? (I've already asked once, with a less
than enthusiasic response).

--
John McGhee - Webmaster; Texas Independent Producers & Royalty Owners
http://www.energyconnect.com
E-mail: jmcghee@tipro.org
Tel: (512) 477-4452 Fax: (512) 476-8070





Indexing virtual domains

Posted: Thu Nov 20, 1997 11:05 am
by Thunderstone


Within a single execution of gw it will pick up the IP address of the
different names as being the same, and as such treat them as the same
URL. What you will need to do is to run gw multiple times, once for
each host.

John McGhee said:



Indexing virtual domains

Posted: Thu Nov 20, 1997 12:39 pm
by Thunderstone


Thanks for the answer. Will webinator then append these together? I would
like a
single index for these sites, instead of having one per domain.

If so, what command line would be appropriate to use?

TIA, -jm

John Turnbull wrote:




--
John McGhee - Webmaster; Texas Independent Producers & Royalty Owners
http://www.energyconnect.com
E-mail: jmcghee@tipro.org
Tel: (512) 477-4452 Fax: (512) 476-8070





Indexing virtual domains

Posted: Mon Nov 24, 1997 1:49 pm
by Thunderstone




John Turnbull wrote:


This doesn't seem to work.

# ./gw -i -unique http://www.emdaapg.org
No database specified. Use the default
(/usr/local/Sites/tipro.com/WWW/webinator
/db)?
(y/n) default is y : y
You may use "-d-" to skip this question in the future.
Getting http://204.181.81.144/robots.txt...Not there...Ok.
Enabling extra duplicate prevention
Adding todo: http://www.energyconnect.com/
http://www.energyconnect.com/ is already in the database
Visited 0 pages total

and
# ./gw -i -unique http://www.gasearch.com
No database specified. Use the default
(/usr/local/Sites/tipro.com/WWW/webinator
/db)?
(y/n) default is y : y
You may use "-d-" to skip this question in the future.
Getting http://204.181.81.144/robots.txt...Not there...Ok.
Enabling extra duplicate prevention
Adding todo: http://www.energyconnect.com/
http://www.energyconnect.com/ is already in the database
Visited 0 pages total

-jm




Indexing virtual domains

Posted: Tue Nov 25, 1997 1:19 pm
by Thunderstone



gw remembers every url you have specified before, so it knows the ip of
the subsequent url is the same as the first. You can make it forget
urls you have specified before by deleting them from the
options table:
gw -st "delete from options where Name='URL'"

Walking each site individually with the above delete in between should
do what you want.

gw does follow http 1.1 to get the right pages from the server, but it
checks ip's first. many machines are called by several different names
within a given web site. if it didn't resolve ip's, you would get
the same pages in the database many times, but with different urls.
One behavior can't handle both situations (virtual sites with same ip,
and aliases for the same sites). The above method will handle virtual sites.