No spider page with tables

Post Reply
verzetti
Posts: 9
Joined: Wed Jan 03, 2001 11:05 am

No spider page with tables

Post by verzetti »

Dear Sirs,
We have tried several times to spider lego.com & nick.com, starting from the following pages:

http://www.lego.com/siteindex/
http://www.nick.com/blab/site_map/site_map.jhtml

but the GW has spidered only these 2 pages but not the entire sites. For this reason, we analyzed the source of these 2 pages and have found that all the links inside the pages are contained in tables.
Next we spidered 2 others sites with their home pages with links in tables and the result was the same.
Is it possible that GW doesn't follow links inside tables?

Regards,
A. Verzetti
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

No spider page with tables

Post by mark »

Tables are not a problem.

lego.com redirects to an asp page (-fasp). That page then refuses to work without cookies and returns no content if you don't have one. gw does not currently support cookies.

nick.com redirects to a wacky url which requires -y to fetch. Then that only has some javascript, no real content. Javascript can not be crawled. If you follow the twisty maze of redirections with geturl and ignore robots.txt you can crawl nick with something like

gw -r -y -jhttp://ad.doubleclick.net/adi/www.nick.com/ 'http://www.nick.com/;$sessionid$1GHJLWY ... tid=692950'
Post Reply