Tika

A toolkit for detecting and extracting metadata and structured text content from various documents.