Walk does not index anymore

michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Walk does not index anymore

Post by michel.weber »

Hi

Our main site is not indexed anymore : http://www.coe.int
In the walk status file it says 'cannot connect to http://www.coe.int'. Any idea?

The site seems to be perfectly accessible

-------------------------------------------------------

Search Appliance Walk Report for y_www

Creating database /usr/local/morph3/texis/ywww.4679af99334d/db2...Done.
Walk started at 2007-06-21 00:16:09 (by user)
JavaScript walking enabled
HTTPS walking enabled
Start fetching at http://www.coe.int/
Ignore urls containing any of the following:
~
/_vti
/cgi-bin/
/Wires/
dev.
test
ToPrint=yes
WcdDoc.asp
WCDsearch.asp
asp?link=http
2007-06-21 00:16:09 started 1 new (20104) on http://www.coe.int/

006 /usr/local/morph3/texis/scripts/dowalk(doprimer) 400: Cannot connect to www.wysistat.com:80: Connection refused
0 pages fetched (87,024 bytes) from http://www.coe.int/
1 errors
0 duplicate pages
No pages fetched. Search not updated.


The link : http://www.coe.int/
Had this error: Cannot connect to www.wysistat.com:80: Connection refused
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk does not index anymore

Post by mark »

michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Walk does not index anymore

Post by michel.weber »

Well yessss, but the profile is configured so as not to fetch offsite pages, so it should not follow that link???
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk does not index anymore

Post by mark »

That's not considered a link (reference to a different page), but a fundamental part of the page being processed. Most people want these portions included even if they don't want offsite pages.

If javascript is not required to navigate the site or include important content you could turn off javascript processing.
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Walk does not index anymore

Post by michel.weber »

Unfortunately i can not just switch off javascript as navigation is based on it.
the http://www.wysistat.com/statistique.js refers to an external script for statistics qhich unfortunately does not work with indexers.

Is there a way to instruct thunderstone to completely ignore anything between 2 text strings? I tried to use 'ignore tags', but that does not seem to work.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk does not index anymore

Post by mark »

Ignore tags removes text but not links (it's usually used for menus etc). I can't think of any way of preventing that javascript except for turning off script fetching. Are you sure you need that? I'm able to navigate that site with javascript turned off in the browser.
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Walk does not index anymore

Post by michel.weber »

Unfortunately the menu system is at least partially based on javascript.

I'll see with the webmaster what can be done.

Is there a chance that to get 'offsite' javascripts treaed as other 'offsite' pages in a future release?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk does not index anymore

Post by mark »

Probably not in the near term.