OCR stands for optical character recognition. We want to use Tesseract to extract readable text from a scanned letter (you know, paper) as a pdf file.
cat /etc/redhat-release CentOS Linux release 7.4.1708 (Core)
The Erick Peirson tutorial does most of the work.
yum install ImageMagick whereis convert convert: /usr/bin/convert /usr/share/man/man1/convert.1.gz
Tesseract installation on CentOS is not a trivial matter but fortunately EisenVault has a working procedure. The operation described is executed in the /opt directory as root user.
We are interesting in Dutch language OCR therefore
From then on testing a file test.pdf results in
convert -density 300 test.pdf -depth 8 -strip \ > -background white -alpha off test.tiff tesseract -l nld test.tiff test.txt
First impression based on a 20-lines text: mostly flawless except for the diacritics