Find the page that links to the missing page. Lookup that page in List/Edit urls. Click "Children". See if the missing page is listed and if there's an error message next to it. If it's listed with no error, turn verbosity up to 4 and do a new walk then check again for the reason it was skipped.
I did a test setup to limit the results.
I started the walk from the following entry-point : http://home.coe.int/t/f/d.r.h/08_Salair ... it%C3%A9s/
None of the links on the page are followed. Only the page itself is indexed.
When i look at the children and their status codes, all the links on the page return one of 2 errors :
Offsite : which is normal, they really are offsite.
or
Unwanted prefix : which is not normal
all the 'exclusion' fields in the profile are empty. I even cleaned out the 'cgibin' nothing changes.
The content type of the page is "text/html; charset=iso-8859-1"
The storage charset is UTF-8
The default charset is WINDOWS-1252
XML UTF-8 is off
The display charset is blank
URLs should not have non-ASCII values, accoding to the URI spec, to avoid this kind of issue. `%C3%A9' is the UTF-8 version of ISO-8859-1 `%E9'. The HTML spec makes UCS (Unicode) the document character set for HTML, and recommends that non-ASCII chars in URLs be mapped from the character _encoding_ (ie. the Content-Type "charset" parameter) to UTF-8 and URL-encoded, which is what the Appliance does; so does Microsoft Explorer 6.0. Firefox 1.5.0.8, however, leaves them in the charset encoding and URL-encodes; this is an older/deprecated behavior I believe.
UTF-8 + URL-encoding was chosen as the standard so that URLs would be consistent regardless of document charset encoding.
However, the real solution is to avoid non-ASCII chars in URLs; you should edit the pages and change them to your desired (ie. ISO-8859-1) URL encoding, so that user-agents (Appliance and browsers) do not have to.
I was afraid you were going to say something like that.
Unfortunately that will not be easily feasable. We have about 1500 people contributing to our web sites. The original documents are word documents which are converted to html