Page 1 of 3

Excel files with dowalk script

Posted: Tue Sep 25, 2001 5:13 pm
by jeuteneier
Can the dowalk script index Microsoft Excel 97 files? I have the commercial version of Webinator with the plug-in. In the <$acceptmime= I am using "application/msexcel". And I set <acceptext ext=".xls">.

When I run do walk, the log file seems to index the .xls just like all the rest but I cannot find the file in the searches.

Thanx,
Justin

Excel files with dowalk script

Posted: Tue Sep 25, 2001 5:52 pm
by John
You might also want to look at the doplugin function, and handle the excel files the same as msword.

Excel files with dowalk script

Posted: Wed Sep 26, 2001 6:23 pm
by jeuteneier
What is the call for Excel supposed to be? The two areas in the Doplugin function look call for Excel in this way:

<case "application/msexcel"><dofilt opt="-fmsx" dt="MSExcel"><return>
<case "xls"><dofilt opt="-fmsx" dt="MSExcel"><return>

Is the opt="-fmsx" correct?

Thanx,
Justin

Excel files with dowalk script

Posted: Wed Sep 26, 2001 8:54 pm
by John
No, the option should be the same as MSWord, -fmsw.

<dofilt opt="-fmsw" dt="MSExcel">

Excel files with dowalk script

Posted: Thu Sep 27, 2001 5:47 pm
by jeuteneier
Ok, I tried that but it still doesn't work. Here are the two calls in the doplugin function.

<case "application/pdf"><dofilt opt="-fpdf" dt="PDF"><return>
<case "application/msword"><dofilt opt="-fmsw" dt="MSWord"><return>
<case "application/msexcel"><dofilt opt="-fmsw" dt="MSExcel"><return>
<case "application/x-shockwave-flash"><doifilt opt="-fswf" dt="Shockwave"><return>

and

<case "pdf"><dofilt opt="-fpdf" dt="PDF"><return>
<case "doc"><dofilt opt="-fmsw" dt="MSWord"><return>
<case "xls"><dofilt opt="-fmsw" dt="MSExcel"><return>
<case "swf"><doifilt opt="-fswf" dt="Shockwave"><return>

It still doesn't get indexed. Is "application/excel" OK?

thanx,
Justin

Excel files with dowalk script

Posted: Thu Sep 27, 2001 5:54 pm
by mark
It depends on what the server is sending. But if the mime type is wrong it will still catch it by extension. And even if it doesn't get processed by the plugin it should still be in the database. Is it in the database?

gw -st "select * from html where Url='thehostname/thepath/thefile.xls'"

Note that case is significant in the filename and extension.

Excel files with dowalk script

Posted: Thu Sep 27, 2001 5:56 pm
by John
It will depend on what your webserver if configured to set for the Excel files. You could try a small script to fetch one, and see what is returned, e.g.

<SCRIPT LANGUAGE=vortex>
<A NAME=main>
<fetch http://something.xls>
<urlinfo contenttype>
$ret
</A>
</SCRIPT>

Excel files with dowalk script

Posted: Thu Sep 27, 2001 6:44 pm
by jeuteneier
I ran the script for both xls and doc files and it returned:

application/vnd.ms-excel
application/msword

When I run:

gw -st "select * from html where Url='http://172.18.21.40/retweb/test.xls'"

it simply returns to prompt with no information at all. I have the commercial version running on AIX.

Thanx,
Justin

Excel files with dowalk script

Posted: Thu Sep 27, 2001 7:47 pm
by John
You would want to change the mime type in the script to match what the webserver is sending, in case the web-server is not sending it if you aren't accepting it. Also in the select you should not have the http://, i.e.

gw -st "select * from html where Url='172.18.21.40/retweb/test.xls'"

Excel files with dowalk script

Posted: Fri Sep 28, 2001 12:23 pm
by jeuteneier
It does not appear to be in the database. When I type in

gw -dretweb -st "select * from html where Url='172.18.21.40/retweb/test.xls'"

it simply does nothing and goes to a new prompt. It does the same when I use "test.doc". But test.doc is being indexed just fine.

My gw file is stored in the bin folder and that is where I am running the command from. Am I missing something else?

Thanx,
Justin