I'm having a problem with links not being walked that fit the following description:
gm -ddbdir -fshtml http://beerexpedition.com/northamerica.shtml
gw -ddbdir -s "select count(id) from html where id > 'start of today'"
returns 11 pages indexed.
There is a file northamerica.shtml that contains links to around 90 pages that each contain links to a total of 3000 pages. So, I try:
gm -ddbdir -fshtml http://beerexpedition.com/northamerica.shtml
it reports northamerica.shtml is already in the database and nothing is added to the database.
It appears that the only way I can get all the pages in the database is to index each individual directory under the domain. Seems strange.
Is it possible that gw is not walking links that look like this:
northamerica.html contains links like:
<a href="/ca">California</a>
in /ca there is an index.shtml file.
Anyone have any suggestions? I'd appreciate any pointers.
Cheers,
Jeff
--
-- Jeff Scott -- Technical Designer -- Real Beer, Inc.
-- jeff@realbeer.com -- (415) 522-1516x310 (voice) -- (415) 522-1535 (fax)
-- Internet Publishers and Consultants -- http://www.realbeer.com
-- The single largest source for beer information known to man!