Recognize text using optical character recognition
[___] = ocr(___,
uses
additional options specified by one or more Name,Value
)Name,Value
pair
arguments, using any of the preceding syntaxes.
businessCard = imread('businessCard.png');
ocrResults = ocr(businessCard)
ocrResults = ocrText with properties: Text: '‘ MathWorks®...' CharacterBoundingBoxes: [103x4 double] CharacterConfidences: [103x1 single] Words: {16x1 cell} WordBoundingBoxes: [16x4 double] WordConfidences: [16x1 single]
recognizedText = ocrResults.Text;
figure;
imshow(businessCard);
text(600, 150, recognizedText, 'BackgroundColor', [1 1 1]);
Read image.
I = imread('handicapSign.jpg');
Define one or more rectangular regions of interest within I.
roi = [360 118 384 560];
You may also use IMRECT to select a region using a mouse: figure; imshow(I); roi = round(getPosition(imrect))
ocrResults = ocr(I, roi);
Insert recognized text into original image
Iocr = insertText(I,roi(1:2),ocrResults.Text,'AnchorPoint',... 'RightTop','FontSize',16); figure; imshow(Iocr);
businessCard = imread('businessCard.png');
ocrResults = ocr(businessCard)
ocrResults = ocrText with properties: Text: '‘ MathWorks®...' CharacterBoundingBoxes: [103x4 double] CharacterConfidences: [103x1 single] Words: {16x1 cell} WordBoundingBoxes: [16x4 double] WordConfidences: [16x1 single]
Iocr = insertObjectAnnotation(businessCard, 'rectangle', ... ocrResults.WordBoundingBoxes, ... ocrResults.WordConfidences); figure; imshow(Iocr);
businessCard = imread('businessCard.png'); ocrResults = ocr(businessCard); bboxes = locateText(ocrResults, 'MathWorks', 'IgnoreCase', true); Iocr = insertShape(businessCard, 'FilledRectangle', bboxes); figure; imshow(Iocr);
I
— Input imageInput image, specified in M-by-N-by-3 truecolor, M-by-N 2-D grayscale, or binary format. The input image must be a real, nonsparse value. The function converts truecolor or grayscale input images to a binary image, before the recognition process. It uses the Otsu’s thresholding technique for the conversion. For best ocr results, the height of a lowercase ‘x’, or comparable character in the input image, must be greater than 20 pixels. From either the horizontal or vertical axes, remove any text rotations greater than +/- 10 degrees, to improve recognition results.
Data Types: single
| double
| int16
| uint8
| uint16
| logical
roi
— Region of interestOne or more rectangular regions of interest, specified as an M-by-4
element matrix. Each row, M, specifies a region
of interest within the input image, as a four-element vector, [x y width height].
The vector specifies the upper-left corner location, [x y],
and the size of a rectangular region of interest, [width height],
in pixels. Each rectangle must be fully contained within the input
image, I
. Before the recognition process, the
function uses the Otsu’s thresholding to convert truecolor
and grayscale input regions of interest to binary regions. The function
returns text recognized in the rectangular regions as an array of
objects.
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
ocr(I,'TextLayout','Block')
'TextLayout'
— Input text layout'Auto'
(default) | 'Block'
| 'Line'
| 'Word'
Input text layout, specified as the comma-separated pair consisting of
'TextLayout
' and one of the
following:
TextLayout | Text Treatment |
---|---|
'Auto' | Determines the layout and reading order of text blocks within the input image. |
'Block' | Treats the text in the image as a single block of text. |
'Line' | Treats the text in the image as a single line of text. |
'Word' | Treats the text in the image as a single word of text. |
'Character' | Treats the text in the image as a single character. |
Use the automatic layout analysis to recognize text from a scanned document that contains a specific format, such as a double column. This setting preserves the reading order in the returned text. You may get poor results if your input image contains a few regions of text or the text is located in a cluttered scene. If you get poor OCR results, try a different layout that matches the text in your image. If the text is located in a cluttered scene, try specifying an ROI around the text in your image in addition to trying a different layout.
'Language'
— Language'English'
(default) | 'Japanese'
| character vector | string scalar | cell array of character vectors | string arrayLanguage to recognize, specified as the comma-separated pair
consisting of 'Language
' and the character vector 'English'
, 'Japanese'
,
or a cell array of character vectors. You can also install the Install OCR Language Data Files package
for additional languages or add a custom language. Specifying multiple
languages enables simultaneous recognition of all the selected languages.
However, selecting more than one language may reduce the accuracy
and increase the time it takes to perform ocr.
To specify any of the additional languages which are contained in the Install OCR Language Data Files package, use the language character vector the same way as the built-in languages. You do not need to specify the path.
txt = ocr(img,'Language','Finnish');
List of Support Package OCR Languages
'Afrikaans'
'Albanian'
'AncientGreek'
'Arabic'
'Azerbaijani'
'Basque'
'Belarusian'
'Bengali'
'Bulgarian'
'Catalan'
'Cherokee'
'ChineseSimplified'
'ChineseTraditional'
'Croatian'
'Czech'
'Danish'
'Dutch'
'English'
'Esperanto'
'EsperantoAlternative'
'Estonian'
'Finnish'
'Frankish'
'French'
'Galician'
'German'
'Greek'
'Hebrew'
'Hindi'
'Hungarian'
'Icelandic'
'Indonesian'
'Italian'
'ItalianOld'
'Japanese'
'Kannada'
'Korean'
'Latvian'
'Lithuanian'
'Macedonian'
'Malay'
'Malayalam'
'Maltese'
'MathEquation'
'MiddleEnglish'
'MiddleFrench'
'Norwegian'
'Polish'
'Portuguese'
'Romanian'
'Russian'
'SerbianLatin'
'Slovakian'
'Slovenian'
'Spanish'
'SpanishOld'
'Swahili'
'Swedish'
'Tagalog'
'Tamil'
'Telugu'
'Thai'
'Turkish'
'Ukrainian'
To use your own custom languages, specify the path to the trained data file as the language
character vector. You must name the file in the format,
<language>.traineddata
. The file must be located
in a folder named 'tessdata
'. For
example:
txt = ocr(img,'Language','path/to/tessdata/eng.traineddata');
txt = ocr(img,'Language', ... {'path/to/tessdata/eng.traineddata',... 'path/to/tessdata/jpn.traineddata'});
traineddata
files in the cell array are
contained in the folder ‘path/to/tessdata
’. Because the following code
points to two different containing folders, it does not work.
txt = ocr(img,'Language', ... {'path/one/tessdata/eng.traineddata',... 'path/two/tessdata/jpn.traineddata'});
traineddata
file
must also exist in the same folder as the Hindi traineddata
file. The
ocr
only supports traineddata
files created using
tesseract-ocr
3.02 or using the OCR
Trainer.
For deployment targets generated by MATLAB®
Coder™:
Generated ocr executable and language data file folder must be colocated.
The tessdata
folder must be named tessdata
:
For English: C:/path/tessdata/eng.traineddata
For Japanese: C:/path/tessdata/jpn.traineddata
For custom data files: C:/path/tessdata/customlang.traineddata
C:/path/ocr_app.exe
You can copy the English and Japanese trained data files from:
fullfile(matlabroot, 'toolbox','vision','visionutilities','tessdata');
'CharacterSet'
— Character subset''
all
characters (default) | character vector | string scalarCharacter subset, specified as the comma-separated pair consisting
of 'CharacterSet
' and a character vector. By
default, CharacterSet
is set to the empty character
vector, ''
. The empty vector sets the function
to search for all characters in the language specified by the Language
property.
You can set this property to a smaller set of known characters to
constrain the classification process.
The ocr
function selects the best match
from the CharacterSet
. Using deducible knowledge
about the characters in the input image helps to improve text recognition
accuracy. For example, if you set CharacterSet
to
all numeric digits, '0123456789'
, the function
attempts to match each character to only digits. In this case, a non-digit
character can incorrectly get recognized as a digit.
txt
— Recognized text and metricsocrText
objectRecognized text and metrics, returned as an ocrText
object.
The object contains the recognized text, the location of the recognized
text within the input image, and the metrics indicating the confidence
of the results. The confidence values range is [0 1] and represents
a percent probability. When you specify an M-by-4 roi
,
the function returns ocrText
as an
M-by-1 array of ocrText
objects.
If your
ocr
results are not what you expect, try one or more of the following options:
Increase the image 2-to-4 times the original size.
If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters separates the characters.
Use binarization to check for non-uniform lighting issues. Use the graythresh
and imbinarize
functions to binarize
the image. If the characters are not visible in the results of the binarization,
it indicates a potential non-uniform lighting issue. Try top hat, using the
imtophat
function, or other
techniques that deal with removing non-uniform illumination.
Use the region of interest roi
option to isolate the text.
Specify the roi
manually or use text detection.
If your image looks like a natural scene containing words, like a street
scene, rather than a scanned document, try using an ROI input. Also, you can set
the TextLayout
property to 'Block'
or
'Word'
.
[1] R. Smith. An Overview of the Tesseract OCR Engine, Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2 (2007), pp. 629-633.
[2] Smith, R., D. Antonova, and D. Lee. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. Proceedings of the International Workshop on Multilingual OCR, (2009).
[3] R. Smith. Hybrid Page Layout Analysis via Tab-Stop Detection. Proceedings of the 10th international conference on document analysis and recognition. 2009.
Usage notes and limitations:
'TextLayout'
, 'Language'
,
and 'CharacterSet'
must be compile-time constants.
Generated code for this function uses a precompiled platform-specific shared library.
graythresh
| imbinarize
| imtophat
| insertShape
| OCR Trainer | ocrText
You have a modified version of this example. Do you want to open this example with your edits?