بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِيمِ , الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ , الرَّحْمَنِ الرَّحِيمِ , مَالِكِ يَوْمِ الدِّينِ , إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ , اهْدِنَا الصِّرَاطَ المُسْتَقِيمَ , صِرَاطَ الَّذِينَ أَنْعَمْتَ عَلَيْهِمْ , غَيْرِ المَغْضُوبِ عَلَيْهِمْ وَلاَ الضَّالِّينَ.

Assalamualaikum w.b.t/السَّلاَمُ عَلَيْكُمْ وَرَحْمَةُ اللهِ وَبَرَكَاتُه
Meja www.peceq.blogspot.com

LINUX COMMANDO

THIS BLOG IS ABOUT THE LINUX COMMAND LINE INTERFACE (CLI), WITH AN OCCASIONAL FORAY INTO GRAPHICAL USER INTERFACE TERRITORY. INSTEAD OF JUST GIVING YOU INFORMATION LIKE SOME MAN PAGE, I HOPE TO ILLUSTRATE EACH COMMAND IN REAL-LIFE SCENARIOS.

OCR Scanning

This post describes how to scan pages from a printed book and convert the image to text using Optical Character Recognition (OCR) technology.
The tools that I use are:

SimpleScan
tesseract

Preparation

SimpleScan is a GUI scan application that comes pre-installed in many Linux distributions (including Debian Wheezy).
To manually install it on Debian:

$ sudo apt-get install simple-scan

tesseract is a command-line OCR program.
To install:

$ sudo apt-get install tesseract-ocr

If English is the language used, that is all you need to install. If you require another language, you must install additional tesseract language packs. Examples are tesseract-ocr-rus for Russian, tesseract-ocr-deu for German, and tesseract-ocr-fra for French.

OCR Procedure

Scan the pages using SimpleScan.
Save the image.
Run the tesseract command:
```
$ tesseract OnWritingWell.jpg out
Tesseract Open Source OCR Engine v3.02 with Leptonica
```
The first parameter is the input image filename. The second parameter is the desired basename of the output text file. The default txt extension is added to the basename, e.g.,out.txt.
If the language is not English, you need to specify the language on the command line using a 3-character language code (refer to the tesseract man page). The following command specifies the use of 3 languages: Russian, German and French.
```
$ tesseract OnWritingWell.jpg myout  -l rus+deu+fra 
```