بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِيمِ , الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ , الرَّحْمَنِ الرَّحِيمِ , مَالِكِ يَوْمِ الدِّينِ , إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ , اهْدِنَا الصِّرَاطَ المُسْتَقِيمَ , صِرَاطَ الَّذِينَ أَنْعَمْتَ عَلَيْهِمْ , غَيْرِ المَغْضُوبِ عَلَيْهِمْ وَلاَ الضَّالِّينَ.
Assalamualaikum w.b.t/السَّلاَمُ عَلَيْكُمْ وَرَحْمَةُ اللهِ وَبَرَكَاتُه
Meja www.peceq.blogspot.com
Assalamualaikum w.b.t/السَّلاَمُ عَلَيْكُمْ وَرَحْمَةُ اللهِ وَبَرَكَاتُه
THIS BLOG IS ABOUT THE LINUX COMMAND LINE INTERFACE (CLI), WITH AN OCCASIONAL FORAY INTO GRAPHICAL USER INTERFACE TERRITORY. INSTEAD OF JUST GIVING YOU INFORMATION LIKE SOME MAN PAGE, I HOPE TO ILLUSTRATE EACH COMMAND IN REAL-LIFE SCENARIOS.
OCR Scanning
This post describes how to scan pages from a printed book and convert the image to text using Optical Character Recognition (OCR) technology.
The tools that I use are:
- SimpleScan
- tesseract
Preparation
SimpleScan is a GUI scan application that comes pre-installed in many Linux distributions (including Debian Wheezy).To manually install it on Debian:
$ sudo apt-get install simple-scantesseract is a command-line OCR program.
To install:
$ sudo apt-get install tesseract-ocrIf English is the language used, that is all you need to install. If you require another language, you must install additional tesseract language packs. Examples are tesseract-ocr-rus for Russian, tesseract-ocr-deu for German, and tesseract-ocr-fra for French.
OCR Procedure
- Scan the pages using SimpleScan.
- Save the image.
- Run the tesseract command:
$ tesseract OnWritingWell.jpg out Tesseract Open Source OCR Engine v3.02 with Leptonica
The first parameter is the input image filename. The second parameter is the desired basename of the output text file. The default txt extension is added to the basename, e.g.,out.txt.
If the language is not English, you need to specify the language on the command line using a 3-character language code (refer to the tesseract man page). The following command specifies the use of 3 languages: Russian, German and French.
$ tesseract OnWritingWell.jpg myout -l rus+deu+fra
Accuracy
Sumber/Ref:
http://linuxcommando.blogspot.com/2014/01/ocr-scanning.html
vmct7.
7tcmv.
Tiada ulasan:
Catat Ulasan