ocrmypdf-auto

Name: ocrmypdf-auto
Author: cmccambridge

Install with PodWarden Learn how to deploy with PodWarden

[p]This container monitors an input file directory for PDF documents to process, and automatically invokes [a href='https://github.com/jbarlow83/OCRmyPDF'][code][strong]OCRmyPDF[/strong][/code][/a] on each file.[/p] [p]It uses [code]inotify[/code] to monitor the input directory efficien‍tly, and is fairly configurable.[/p] [h4]Configuration Details[/h4] [p]See the descri‍ptions of the unRAID volumes and environment variables for highlights of the configurability of [code]ocrmypdf-auto[/code], but for details including how to specify custom commandline parameters to [code]ocrmydf[/code] itself, or custom [code]tesseract[/code] configuration files, see the full README at [a href='https://github.com/cmccambridge/ocrmypdf-auto/blob/master/R‍EADME.md']https://github.com/cmccambridge/ocrmypdf-auto/blob/master/README.md[/a][/p]

OtherFree·335.0K15y ago

#document-processing #pdf-ocr #ocrmypdf-auto #tesseract-ocr #file-monitoring #ocrmypdf-alternative #searchable-pdfs #scanned-document-processing #inotify #document-automation #containerized-ocr #linux-container #batch-pdf-processing #automated-ocr

Learn how to self-host

Learn how to deploy with PodWarden

About

ocrmypdf-auto is a containerized automation tool that processes PDF documents by applying optical character recognition (OCR). It continuously monitors an input directory for new PDF files using inotify, processes them with OCRmyPDF and Tesseract-OCR, then outputs searchable PDFs…