Excel files with dowalk script

Post by **mark** » Fri Sep 28, 2001 2:52 pm

You must be typing the url incorrectly. See what the url is when you do a search for the file that appears. Use that (minus the leading http://) to find the .doc file. Then use a similar url to find the .xls file.

jeuteneier · Post by **jeuteneier** » Mon Oct 01, 2001 6:06 pm

I got the Excel document to be recognized by using -fother in the dowalk scipt (thanx John). One more problem tho, when the results come up it doesn't seem to display the title. It just says, "MSExcel Document (309KB)".

How do I get the title recognized?

Thanx,
Justin

Post by **mark** » Tue Oct 02, 2001 10:22 am

The plugin doesn't currently know how to extract titles from excel files.

jeuteneier · Post by **jeuteneier** » Tue Oct 02, 2001 12:09 pm

Will that be an update in the new version? Do you have any idea of when that will be available?

Justin

Faiz · Post by **Faiz** » Wed Oct 24, 2001 3:19 pm

Hi,
I could extract the content of an excel file using the option -fother but not using -fmsw (it returns nothing using this option). While -fother works fine but it also gets some junk characters like "B a x X A r i a l A r i a l A r i a l A r i a l A r i a l A
r i a l Red Red x Sheet1 a Sheet2 h Sheet3 i C". Is it possible take out these characters from the return value?? Are there any workarounds?

Post by **mark** » Wed Oct 24, 2001 3:49 pm

You could remove some things with <sandr> but there's not much point and it may cause good hits to be missed. The extracted text is primarily used for searching. You generally only see the part of the text that contained the best match to the query (unless you click on match info).

Faiz · Post by **Faiz** » Mon Oct 29, 2001 10:15 am

The plugin used in the dowalk script extracts the contents of most of the excel files, but for some, it fails. What could be the reason? Does that have something to do with text formatting or file corruption?

Post by **John** » Mon Oct 29, 2001 10:43 am

Most likely it has to do with the way the text is stored in the file. Is it maybe different versions of Excel?

Post by **mark** » Mon Oct 29, 2001 10:44 am

Some reasons for failure could be file truncation during download because of too small a -z setting, encryption, graphical text, multi-byte character sets.

Faiz · Post by **Faiz** » Tue Oct 30, 2001 9:41 am

thanx. what is too small a-z setting? is it the font size?