Page 1 of 1

Crawling Powerpoint documents

Posted: Mon Feb 12, 2001 2:04 pm
by Faiz
The plugin anytotx indexes the text in MS Powerpoint properly, but it also adds some unwanted characters like,
O S H X b C4 h F V W Q b2R O db G ci h z t G U0 uV w kj M E F x4 X8 htT U Y9 Ww K T t e J H Vb h H FsOh j XW9 B w K G V If aa y f6 w A N xtfZ eIa zy L r Z E P h M
As a result, if i look for the keyword "vb", these documents also show up in the search results even though the keyword is not present in the actual text. Is there a way to avoid indexing these unwanted characters and the footers of these documents (which defaults to the user name that created this document and the word "Microsoft Powerpoint").

Thanx,

Crawling Powerpoint documents

Posted: Mon Feb 12, 2001 3:07 pm
by mark
Powerpoint isn't specifically supported, but it's related to MSWord. You might get better results by using -fmsw instead of -fother.