cyril.adam
Posts: 8 Joined: Tue Feb 06, 2007 9:58 am
Post
by cyril.adam » Tue Mar 27, 2007 3:33 am
Hi
I've an issue when I try to walk an internet wezb site that contain word documents.
I've got an issue like
The link : http://www..........file.doc
Had this error: Not in requirements
I can't do any research on words in any documents of the web site.
Do you have an idea on what this issue is due to ?
Thanks
John
Site Admin
Posts: 2623 Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Post
by John » Tue Mar 27, 2007 8:36 am
Make sure you include .doc in the extension list you want indexed on the basic walk settings page.
John Turnbull
Thunderstone Software
cyril.adam
Posts: 8 Joined: Tue Feb 06, 2007 9:58 am
Post
by cyril.adam » Tue Mar 27, 2007 9:52 am
Yes of course there is .doc in the extension list :
Here are the extensions defined :
asp .aspx .doc .html .htm .jsp .pdf .php .swf .txt .xls
John
Site Admin
Posts: 2623 Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Post
by John » Tue Mar 27, 2007 9:59 am
Another possibility is that you have stay under defined, or it is on a different server, so it does not match the required url prefix, or if you have a "Required REX" it doesn't match that.
John Turnbull
Thunderstone Software
John
Site Admin
Posts: 2623 Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Post
by John » Tue Mar 27, 2007 10:43 am
"Stay Under" is an option, if set to yes based on the base url it will only index documents with a prefix:
http://www.pgregister.coe.int/Pompidou_ ... jectfiles/
i.e. it will stay under the directory containing the Base URL.
Set Stay Under to "N" to crawl elsewhere on the server.
John Turnbull
Thunderstone Software
jason112
Site Admin
Posts: 347 Joined: Tue Oct 26, 2004 5:35 pm
Post
by jason112 » Tue Mar 27, 2007 10:46 am
alternatively if you only want things under
http://www.pgregister.coe.int/Pompidou_files/Pro15
added to the crawl (as opposed to everytyhing on pgregister.coe.int), you can add that URL to the "required prefix".
cyril.adam
Posts: 8 Joined: Tue Feb 06, 2007 9:58 am
Post
by cyril.adam » Tue Mar 27, 2007 11:03 am
I've set Stay Under to No and I've still the same issue...
I do not want only things under
http://www.pgregister.coe.int/Pompidou_files/Pro15 but also .../Pro1 2 3 4 5 6 .....
My issue is listed in the "Checking for broken hyperlinks..." of the walk status
Is this issue can be because of spaces included into the document name ?
mark
Site Admin
Posts: 5519 Joined: Tue Apr 25, 2000 6:56 pm
Post
by mark » Tue Mar 27, 2007 11:51 am
Did you do a rewalk mode "new" or "refresh" walk? If refresh it may not have done much if nothing was due. Do a mode new to ensure that everything is redone.
cyril.adam
Posts: 8 Joined: Tue Feb 06, 2007 9:58 am
Post
by cyril.adam » Wed Mar 28, 2007 8:19 am
I changed the rewalk mode from "refresh" to "new" and it is working well right now
Thank you for your help
Cyril.