You must be typing the url incorrectly. See what the url is when you do a search for the file that appears. Use that (minus the leading http://) to find the .doc file. Then use a similar url to find the .xls file.
I got the Excel document to be recognized by using -fother in the dowalk scipt (thanx John). One more problem tho, when the results come up it doesn't seem to display the title. It just says, "MSExcel Document (309KB)".
Hi,
I could extract the content of an excel file using the option -fother but not using -fmsw (it returns nothing using this option). While -fother works fine but it also gets some junk characters like "B a x X A r i a l A r i a l A r i a l A r i a l A r i a l A
r i a l Red Red x Sheet1 a Sheet2 h Sheet3 i C". Is it possible take out these characters from the return value?? Are there any workarounds?
You could remove some things with <sandr> but there's not much point and it may cause good hits to be missed. The extracted text is primarily used for searching. You generally only see the part of the text that contained the best match to the query (unless you click on match info).
The plugin used in the dowalk script extracts the contents of most of the excel files, but for some, it fails. What could be the reason? Does that have something to do with text formatting or file corruption?
Some reasons for failure could be file truncation during download because of too small a -z setting, encryption, graphical text, multi-byte character sets.