A page with two clickable images

Post by **Thunderstone** » Wed Dec 02, 1998 1:04 pm

Hello:

I have one website from Taiwan that is a good page about International
Trade. I first visited the website (http://tptaiwan.org.tw/) and
discovered that the first page contains two clickable images:

http://203.66.210.8/indexsetc.htm ---> links to the second page
written in Chinese
http://203.66.210.8/indexsete.htm ---> links to the second page
written in English

There is no robots.txt under http://tptaiwan.org.tw/, so
I used gw http://tptaiwan.org.tw/ but after the run the robot
just gave me one indexed page, the very first page, nothing from
both second pages (i.e., gw -st "select * from html" only gave
me the text of the first page, and gw -st "select * from refs"
regarded both second pages as references).

Then I assumed that http://tptaiwan.org.tw/ has put its content
on http://203.66.210.8/, so I used gw -o http://tptaiwan.org.tw/
but same results were displayed.

Both http://203.66.210.8/indexsetc.htm and
http://203.66.210.8/indexsete.htm also contain many clickable/linkable
images. When I tried gw http://203.66.210.8/indexsetc(e).htm, the
robot was able to follow the links. So why it didn't do so for
the very first page, http://tptaiwan.org.tw/?

How can Webinator fetch and index any website that contains one
or more clickable/linkable images? How can I configure Webinator
to AUTOMATICALLY follow both second pages of http://tptaiwan.org.tw/,
supposing that I didn't know it has images and different hostnames
for its subsequent pages?

Thank you in advance.
David Chan

______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com

Post by **Thunderstone** » Wed Dec 02, 1998 1:31 pm

gw stays on the site you specify. tptaiwan.org.tw's IP is not 203.66.210.8
so gw considers it a different site. Had they been the same, it would
have followed the links.

If you tell gw follow offsite links, you're likely to start wandering
all over the world. You could use the -network option to let it walk
all machines in their network. See
http://www.thunderstone.com/gw25man/node50.html