Excel files with dowalk script

jeuteneier
Posts: 32
Joined: Wed May 16, 2001 2:54 pm

Excel files with dowalk script

Post by jeuteneier »

Can the dowalk script index Microsoft Excel 97 files? I have the commercial version of Webinator with the plug-in. In the <$acceptmime= I am using "application/msexcel". And I set <acceptext ext=".xls">.

When I run do walk, the log file seems to index the .xls just like all the rest but I cannot find the file in the searches.

Thanx,
Justin
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Excel files with dowalk script

Post by John »

You might also want to look at the doplugin function, and handle the excel files the same as msword.
John Turnbull
Thunderstone Software
jeuteneier
Posts: 32
Joined: Wed May 16, 2001 2:54 pm

Excel files with dowalk script

Post by jeuteneier »

What is the call for Excel supposed to be? The two areas in the Doplugin function look call for Excel in this way:

<case "application/msexcel"><dofilt opt="-fmsx" dt="MSExcel"><return>
<case "xls"><dofilt opt="-fmsx" dt="MSExcel"><return>

Is the opt="-fmsx" correct?

Thanx,
Justin
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Excel files with dowalk script

Post by John »

No, the option should be the same as MSWord, -fmsw.

<dofilt opt="-fmsw" dt="MSExcel">
John Turnbull
Thunderstone Software
jeuteneier
Posts: 32
Joined: Wed May 16, 2001 2:54 pm

Excel files with dowalk script

Post by jeuteneier »

Ok, I tried that but it still doesn't work. Here are the two calls in the doplugin function.

<case "application/pdf"><dofilt opt="-fpdf" dt="PDF"><return>
<case "application/msword"><dofilt opt="-fmsw" dt="MSWord"><return>
<case "application/msexcel"><dofilt opt="-fmsw" dt="MSExcel"><return>
<case "application/x-shockwave-flash"><doifilt opt="-fswf" dt="Shockwave"><return>

and

<case "pdf"><dofilt opt="-fpdf" dt="PDF"><return>
<case "doc"><dofilt opt="-fmsw" dt="MSWord"><return>
<case "xls"><dofilt opt="-fmsw" dt="MSExcel"><return>
<case "swf"><doifilt opt="-fswf" dt="Shockwave"><return>

It still doesn't get indexed. Is "application/excel" OK?

thanx,
Justin
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Excel files with dowalk script

Post by mark »

It depends on what the server is sending. But if the mime type is wrong it will still catch it by extension. And even if it doesn't get processed by the plugin it should still be in the database. Is it in the database?

gw -st "select * from html where Url='thehostname/thepath/thefile.xls'"

Note that case is significant in the filename and extension.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Excel files with dowalk script

Post by John »

It will depend on what your webserver if configured to set for the Excel files. You could try a small script to fetch one, and see what is returned, e.g.

<SCRIPT LANGUAGE=vortex>
<A NAME=main>
<fetch http://something.xls>
<urlinfo contenttype>
$ret
</A>
</SCRIPT>
John Turnbull
Thunderstone Software
jeuteneier
Posts: 32
Joined: Wed May 16, 2001 2:54 pm

Excel files with dowalk script

Post by jeuteneier »

I ran the script for both xls and doc files and it returned:

application/vnd.ms-excel
application/msword

When I run:

gw -st "select * from html where Url='http://172.18.21.40/retweb/test.xls'"

it simply returns to prompt with no information at all. I have the commercial version running on AIX.

Thanx,
Justin
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Excel files with dowalk script

Post by John »

You would want to change the mime type in the script to match what the webserver is sending, in case the web-server is not sending it if you aren't accepting it. Also in the select you should not have the http://, i.e.

gw -st "select * from html where Url='172.18.21.40/retweb/test.xls'"
John Turnbull
Thunderstone Software
jeuteneier
Posts: 32
Joined: Wed May 16, 2001 2:54 pm

Excel files with dowalk script

Post by jeuteneier »

It does not appear to be in the database. When I type in

gw -dretweb -st "select * from html where Url='172.18.21.40/retweb/test.xls'"

it simply does nothing and goes to a new prompt. It does the same when I use "test.doc". But test.doc is being indexed just fine.

My gw file is stored in the bin folder and that is where I am running the command from. Am I missing something else?

Thanx,
Justin
Post Reply