Crawling Powerpoint documents

Post Reply
Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

Crawling Powerpoint documents

Post by Faiz »

The plugin anytotx indexes the text in MS Powerpoint properly, but it also adds some unwanted characters like,
O S H X b C4 h F V W Q b2R O db G ci h z t G U0 uV w kj M E F x4 X8 htT U Y9 Ww K T t e J H Vb h H FsOh j XW9 B w K G V If aa y f6 w A N xtfZ eIa zy L r Z E P h M
As a result, if i look for the keyword "vb", these documents also show up in the search results even though the keyword is not present in the actual text. Is there a way to avoid indexing these unwanted characters and the footers of these documents (which defaults to the user name that created this document and the word "Microsoft Powerpoint").

Thanx,
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Crawling Powerpoint documents

Post by mark »

Powerpoint isn't specifically supported, but it's related to MSWord. You might get better results by using -fmsw instead of -fother.
Post Reply