anotherbyte.net: Extracting Plain Text for Indexing

      Tagged , , ,

Searching by keyword requires an index (if you don't want to do it dynamically).

An index requires plain text. And there are a lot of formats out there that are not plain text, especially PDF.

Here are some ways to extract plain (possibly formatted) text from a pdf document:

There are also plenty of shareware and strictly commercial products out there.

  

blog comments powered by Disqus