So you probably want Storage Charset and Source Default Charset to be UTF-8 and to turn off XML UTF-8.
If you think the walk is missing pages turn verbosity up to 4 and run a mode new walk. Then go to Profile Tools, click List URLs, click the Base URL (or whatever url links to the missing pages) to get info about that page, click the Children link there to see what urls were found on the page and eny errors or reasons for rejection for those urls.
You can test against a public site without beating it up by setting max depth to something low like 1 or 2 and/or by setting max pages to something low like 10 or 100. Or you could just start the walk then hit "Pause walk and live" on the walk status page after a minute or so.
Note: The HTTP headers returned by the server aren't shown by view source and may affect the crawl.
If you think the walk is missing pages turn verbosity up to 4 and run a mode new walk. Then go to Profile Tools, click List URLs, click the Base URL (or whatever url links to the missing pages) to get info about that page, click the Children link there to see what urls were found on the page and eny errors or reasons for rejection for those urls.
You can test against a public site without beating it up by setting max depth to something low like 1 or 2 and/or by setting max pages to something low like 10 or 100. Or you could just start the walk then hit "Pause walk and live" on the walk status page after a minute or so.
Note: The HTTP headers returned by the server aren't shown by view source and may affect the crawl.