How to get the best OCR results

Source Image Recommendations

The best recognition results can only be achieved if the source image is of good quality. To ensure that users of your application receive the best possible results, you need to pay some attention to the quality of input images.

You can implement some basic quality-checking features, display tips for best image quality in the user interface and include advanced recommendations in the user's manual. If your application contains a scanning interface, you can also control the scanner settings.

Check input image quality

When your application receives an image for recognition, one of the first steps can be checking the image resolution. If the resolution is too small (lower than 150 dpi), some image details might be lost, and the recognition quality will deteriorate. If, on the other hand, the resolution is too great (greater than 600-700 dpi), loading and processing the image will take more time, while the recognition quality will not be materially improved.

Another important tip concerns documents printed in very small font sizes (less than 10 pt). If they are scanned with the common 300 dpi resolution, the result may not be recognized well:

Image of text with low dpi

Increase the scanning resolution to 400-600 dpi in this case:

image of text with a higher dpi

Next, you can implement some algorithm for estimating the image brightness. When the image, on the average, is very light, some parts of the text may be irretrievably overlighted, with "torn" characters:

image of text with light lighting

When the image is very dark, some parts of the text may be obscured by dark patches, characters stuck together or distorted:

image of text with very dark lighting

Both these defects can affect the recognition quality. The results may not be good enough.

Scan your documents in grayscale, rather than in black-and-white mode. You will still need to select a reasonable brightness value, but the details loss in grayscale will be less significant in any case.

Control the phone camera settings

For mobile devices:

  • It's best to turn off the flash when photographing documents. See the example of how the flash can damage digital document quality:

picture of worksheet that was taken with a flash that created a shine on the image

Control the scanner settings

The following settings are recommended for scanning directly from a scanner:

  • set the brightness to a reasonable medium value, for example 50;
  • set the resolution to 300 dpi;
  • if the colour of the documents is not important for the task, choose grayscale colour mode: scanning in black-and-white will not always produce the best results, especially if the original document is not of very good quality.

Before taking a picture

1.    Make sure that the page fits entirely within the frame.

2.    Make sure there is enough light (preferably daylight). In artificial lighting, use two light sources positioned so as to avoid shadows.

Turn off the flash to avoid glare.

3.    Position the lens parallel to the plane of the document and point it toward the center of the text. At full optical zoom, the distance between the camera and the document must be sufficient to fit the entire document into the frame. Usually this distance will be 50-60 cm.

4.    If possible, use a tripod to avoid shaking.

Enable the anti-shake system, if available.

Use auto release to prevent the shaking of the camera caused by pressing the shutter release button (even when using a tripod).

5.    If your camera allows, use a white sheet of paper to set white balance. Otherwise, select the white balance mode which best suits the current lighting conditions.

Below you can see an example of wrong white balance (on the left) and correct white balance (on the right):

picture of a worksheet that has an orange tint to itimage of a worksheet that has the correct black and white coloring

What do I do if…..

  1. There is not enough light?

Try the following:

Select a greater aperture value

Select a greater ISO value for sensitivity

Use manual focusing if the camera cannot lock the focus automatically

2. The picture is not sharp enough?

Auto focus may not work properly in poor lighting or when photographing at a close distance.

In poor lighting conditions, try using an additional light source.

When photographing a document up close, try using the Macro (or Close-Up) mode. Otherwise, if possible, focus the camera manually.

If only a part of the picture is blurred, try reducing the aperture value. Increase the distance between the document and the camera and use maximum zoom. Focus on a point anywhere between the center and a border of the image.

In poor lighting conditions, when shooting in auto mode, the camera will use slower shutter speeds, which makes the resulting photo less sharp. Enable the anti-shake system or use a tripod.

Original source information: http://ocrsdk.com/documentation/best-practices/image-recommendations/