ocr

Recognize text using optical character recognition

Description

example

txt = ocr(I) returns an ocrText object containing optical character recognition information from the input image, I. The object contains recognized text, text location, and a metric indicating the confidence of the recognition result.

example

txt = ocr(I, roi) recognizes text in I within one or more rectangular regions. The roi input contains an M-by-4 matrix, with M regions of interest.

example

[___] = ocr(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments, using any of the preceding syntaxes.

Examples

collapse all

     businessCard   = imread('businessCard.png');
     ocrResults     = ocr(businessCard)
ocrResults = 
  ocrText with properties:

                      Text: '‘ MathWorks®...'
    CharacterBoundingBoxes: [103x4 double]
      CharacterConfidences: [103x1 single]
                     Words: {16x1 cell}
         WordBoundingBoxes: [16x4 double]
           WordConfidences: [16x1 single]

     recognizedText = ocrResults.Text;    
     figure;
     imshow(businessCard);
     text(600, 150, recognizedText, 'BackgroundColor', [1 1 1]);

Read image.

I = imread('handicapSign.jpg');

Define one or more rectangular regions of interest within I.

roi = [360 118 384 560];

You may also use IMRECT to select a region using a mouse: figure; imshow(I); roi = round(getPosition(imrect))

ocrResults = ocr(I, roi);

Insert recognized text into original image

Iocr = insertText(I,roi(1:2),ocrResults.Text,'AnchorPoint',...
    'RightTop','FontSize',16);
figure; imshow(Iocr);

     businessCard = imread('businessCard.png');
     ocrResults   = ocr(businessCard)
ocrResults = 
  ocrText with properties:

                      Text: '‘ MathWorks®...'
    CharacterBoundingBoxes: [103x4 double]
      CharacterConfidences: [103x1 single]
                     Words: {16x1 cell}
         WordBoundingBoxes: [16x4 double]
           WordConfidences: [16x1 single]

     Iocr         = insertObjectAnnotation(businessCard, 'rectangle', ...
                           ocrResults.WordBoundingBoxes, ...
                           ocrResults.WordConfidences);
     figure; imshow(Iocr);

businessCard = imread('businessCard.png');
ocrResults = ocr(businessCard);
bboxes = locateText(ocrResults, 'MathWorks', 'IgnoreCase', true);
Iocr = insertShape(businessCard, 'FilledRectangle', bboxes);
figure; imshow(Iocr);

Input Arguments

collapse all

Input image, specified in M-by-N-by-3 truecolor, M-by-N 2-D grayscale, or binary format. The input image must be a real, nonsparse value. The function converts truecolor or grayscale input images to a binary image, before the recognition process. It uses the Otsu’s thresholding technique for the conversion. For best ocr results, the height of a lowercase ‘x’, or comparable character in the input image, must be greater than 20 pixels. From either the horizontal or vertical axes, remove any text rotations greater than +/- 10 degrees, to improve recognition results.

Data Types: single | double | int16 | uint8 | uint16 | logical

One or more rectangular regions of interest, specified as an M-by-4 element matrix. Each row, M, specifies a region of interest within the input image, as a four-element vector, [x y width height]. The vector specifies the upper-left corner location, [x y], and the size of a rectangular region of interest, [width height], in pixels. Each rectangle must be fully contained within the input image, I. Before the recognition process, the function uses the Otsu’s thresholding to convert truecolor and grayscale input regions of interest to binary regions. The function returns text recognized in the rectangular regions as an array of objects.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: ocr(I,'TextLayout','Block')

Input text layout, specified as the comma-separated pair consisting of 'TextLayout' and one of the following:

TextLayoutText Treatment
'Auto'Determines the layout and reading order of text blocks within the input image.
'Block'Treats the text in the image as a single block of text.
'Line'Treats the text in the image as a single line of text.
'Word'Treats the text in the image as a single word of text.
'Character'Treats the text in the image as a single character.

Use the automatic layout analysis to recognize text from a scanned document that contains a specific format, such as a double column. This setting preserves the reading order in the returned text. You may get poor results if your input image contains a few regions of text or the text is located in a cluttered scene. If you get poor OCR results, try a different layout that matches the text in your image. If the text is located in a cluttered scene, try specifying an ROI around the text in your image in addition to trying a different layout.

Language to recognize, specified as the comma-separated pair consisting of 'Language' and the character vector 'English', 'Japanese', or a cell array of character vectors. You can also install the Install OCR Language Data Files package for additional languages or add a custom language. Specifying multiple languages enables simultaneous recognition of all the selected languages. However, selecting more than one language may reduce the accuracy and increase the time it takes to perform ocr.

To specify any of the additional languages which are contained in the Install OCR Language Data Files package, use the language character vector the same way as the built-in languages. You do not need to specify the path.

txt = ocr(img,'Language','Finnish');

 List of Support Package OCR Languages

To use your own custom languages, specify the path to the trained data file as the language character vector. You must name the file in the format, <language>.traineddata. The file must be located in a folder named 'tessdata'. For example:

txt = ocr(img,'Language','path/to/tessdata/eng.traineddata');
You can load multiple custom languages as a cell array of character vectors:
txt = ocr(img,'Language', ...
               {'path/to/tessdata/eng.traineddata',...
                'path/to/tessdata/jpn.traineddata'});
The containing folder must always be the same for all the files specified in the cell array. In the preceding example, all of the traineddata files in the cell array are contained in the folder ‘path/to/tessdata’. Because the following code points to two different containing folders, it does not work.
txt = ocr(img,'Language', ...
               {'path/one/tessdata/eng.traineddata',...
                'path/two/tessdata/jpn.traineddata'});
Some language files have a dependency on another language. For example, Hindi training depends on English. If you want to use Hindi, the English traineddata file must also exist in the same folder as the Hindi traineddata file. The ocr only supports traineddata files created using tesseract-ocr 3.02 or using the OCR Trainer.

For deployment targets generated by MATLAB® Coder™: Generated ocr executable and language data file folder must be colocated. The tessdata folder must be named tessdata:

  • For English: C:/path/tessdata/eng.traineddata

  • For Japanese: C:/path/tessdata/jpn.traineddata

  • For custom data files: C:/path/tessdata/customlang.traineddata

  • C:/path/ocr_app.exe

You can copy the English and Japanese trained data files from:

fullfile(matlabroot, 'toolbox','vision','visionutilities','tessdata');

Character subset, specified as the comma-separated pair consisting of 'CharacterSet' and a character vector. By default, CharacterSet is set to the empty character vector, ''. The empty vector sets the function to search for all characters in the language specified by the Language property. You can set this property to a smaller set of known characters to constrain the classification process.

The ocr function selects the best match from the CharacterSet. Using deducible knowledge about the characters in the input image helps to improve text recognition accuracy. For example, if you set CharacterSet to all numeric digits, '0123456789', the function attempts to match each character to only digits. In this case, a non-digit character can incorrectly get recognized as a digit.

Output Arguments

collapse all

Recognized text and metrics, returned as an ocrText object. The object contains the recognized text, the location of the recognized text within the input image, and the metrics indicating the confidence of the results. The confidence values range is [0 1] and represents a percent probability. When you specify an M-by-4 roi, the function returns ocrText as an M-by-1 array of ocrText objects.

If your ocr results are not what you expect, try one or more of the following options:

  • Increase the image 2-to-4 times the original size.

  • If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters separates the characters.

  • Use binarization to check for non-uniform lighting issues. Use the graythresh and imbinarize functions to binarize the image. If the characters are not visible in the results of the binarization, it indicates a potential non-uniform lighting issue. Try top hat, using the imtophat function, or other techniques that deal with removing non-uniform illumination.

  • Use the region of interest roi option to isolate the text. Specify the roi manually or use text detection.

  • If your image looks like a natural scene containing words, like a street scene, rather than a scanned document, try using an ROI input. Also, you can set the TextLayout property to 'Block' or 'Word'.

References

[1] R. Smith. An Overview of the Tesseract OCR Engine, Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2 (2007), pp. 629-633.

[2] Smith, R., D. Antonova, and D. Lee. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. Proceedings of the International Workshop on Multilingual OCR, (2009).

[3] R. Smith. Hybrid Page Layout Analysis via Tab-Stop Detection. Proceedings of the 10th international conference on document analysis and recognition. 2009.

Extended Capabilities

Introduced in R2014a