Data capture, character recognition OCR ICR OMR CHR BCR, image processing, forms data capture, document indexing, automatic data extraction Data capture, character recognition OCR ICR OMR CHR BCR, image processing, forms data capture, document indexing, automatic data extraction
OMR - Optical Mark Recognition

OMR - Optical Mark Recognition

Between the various technologies for data capture, OMR (Optical Mark Recognition), that is data capture of markings, may seem like something trivial, simple to implement and obvious in the results. However, experience teaches taht to get excellent results must give due consideration to a number of issues that often may not be immediately evident. The OMR essentialy consists in determining if a checkbox is empty or fully, defined a dual output that can be constituted to 1 or 0. The sequence of binary values obtained by the reading, can then be recombined to obtained other composed values, in according to the meaning that is given to the boxes. Due to its high efficiency compared to other technologies of data capture to manuscripts signs, such as the much more complex ICR, it is preferred in all those circumstances in which it may very well solve the problem of data acquisition.

A classic example to use very common is the play of football ticket, lottery and Super Enalotto: blaking out some boxes representing the numbers that we want to put into play, we get that when the paper support is inserted into the terminal played the game is automatically recorder and printed the receipet whit our choices "in clear". Using this OMR technology was born several year ago, whit the availability of specific hardware devices, the called "scanners", that can process automatically of the modules appropriately prepared, mesauring the reflectivity of light on the surface of the sheet at predetermined positions.

In 1938 he was presented by the IBM the first device with OMR technology, called the Type 805 Test Scoring Machine, dedicated to the data capture of multiple-choice tests in education field, be able to have a productivity 10 times greater than the manual verification.

Type 805 Test Scoring Machine product in 1938 made by IBM: is it first device OMR electromechanical.

Il Type 805 Test Scoring Machine product in 1938 made by IBM: is it first device OMR electromechanical.


The actual scanners are able of array of photoelectric sensors, equidistant, on which the sheet is slid mechanically to processing The form must then be suitably arranged so that the boxes, printed with a specific color, they are in the exact location where these sensors are located, and also must be provided on the edge of the sheet, for each row to be read, a frame synchronization, a sort of black box, that indicating precisely when the sensors must be activated to perform the sampling. Is evident that the use of a hardware system has a number of inherent limitations are difficult to overcome.

Today the hardware solution, although still in use in restricted areas, have been supplanted by software solutions, extremely more flexible, that using a normal documental scanner to acquired the forms performing the recognition directly on digital image and obtaining a productivity hundreds of times greater than the operations manual.

In the OMR technology is fundamental the way in which it is discriminated a box blackened to a box empty. Working on a digital image, usually binarized, the usual discriminant used to evaluate if a box is full or empty, is represented by the ratio between black pixel and white puxel in the area covered by the box: selecting an appropriate treshold could be defined if a box is full or empty. In practice, however, it appens that those who fill out a form instead of blackening and often completely fill the boxes that intends to select, simply mark then with a check mark, a tick, to impose an X.

Using disciminant only as the amount of black in the box, there is the risk of identifiyng as empty bgoxes that have only a check mark and identify as boxes fully, that contain only points spurious.

Using discriminant only as the amount of black in the box, there is the risk of identifiyng as empty boxes that have only a check mark and identify as boxes fully, that contain only points spurious.


This can result, therefore, that the percentage of the black pixelsis at below the minimum established necessary to establish the boxes as full and the system then returns the box as empty, even if in reality it is checked.

The solution in most cases is to use a very low threshold, but in doing so run the risk that small marks caused by fingerprints, dirt and stains can be obtaining the opposite result, namely full identify as a box which is really empty. An excellent solution to this problem was developed by Recogniform Technologies and implemented in their products data capture: in addition to calculating the percentage of pixels blacks, is also calculated the size of the section of ink present in the box, so as to be able to use a double threshold.

The technology implemented by Recogniform Technologies in its OMR products, through the use if threshold filling ink and the threshold of sign extension, allows a perfect discrimination between fully and empty boxes.

The technology implemented by Recogniform Technologies in its OMR products, through the use if threshold filling ink and the threshold of sign extension, allows a perfect discrimination between fully and empty boxes.


With this effective solution, the presence of dirt, that is, points blacks isolated, can increase the amount of pixels blacks, but not the size of the tract written, by shielding from false positives, namely, prevents the empty boxes are given for full. Conversely, the presence of a checkmark thin, although do slightly increase the amount of pixels blacks, does increase the size of the stroke written, by shielding from false negatives, that is, avoids that are crossed out dates for empty boxes.

At the moment it is not possible to automatically recognize with absolute certainty whether a box is full, or if it is empty, that's a software solution demonstrates its advantage over an optical reader, having the opportunity to present at video the image and ask an operator to confirm or modify the data detected.

But even here you need to give attention because usually forms designed for automatic reading with OMR have the boxes drawn with ink blind, that is, with a color that is filtered during the scan (usually red or green light), and that the image becomes white.

In an image of this type presented on the screen, you would see only the writings of those who filled out the sheet as if they were displayed on a blank page: without having the reference of the columning could be difficult to make checks on the correct interpretation by the reading system.

To solve this problem are different way. One solution may be to acquire the form on color and make the image directly, via software, filtering of the color of columning, only during processing. A second solution may be to acquire the module without the color filter, for which the columning will be visible with the same black color traits affixed by the person who filled out the form: in this case the OMR system must also take into consideration the amount of pixels blacks present in the empty boxes that would not be totally white. A third solution may be to acquire the module filtrate, but to reconstruct at video as if it had been acquired unfiltered or even color, using an overlay technique, namely image overlay.

Because each of these solutions has its own benefits, some evolved system data capture (ex. Recogniform Reader) allows the user to choose which one to use depending on the needs and circumstances.

On a module acquired by filtering the color with which the boxes are designed the boxes is more easy to make a recognition, but it is inconvenient to perform a possible verification o correction manual of data (above). On the same modules on which was imprinted on overaly, after recognition, a empty form not filtering is much easier to execute a possible verification or correction manual of the data (below).
On a module acquired by filtering the color with which the boxes are designed the boxes is more easy to make a recognition, but it is inconvenient to perform a possible verification o correction manual of data (above). On the same modules on which was imprinted on overaly, after recognition, a empty form not filtering is much easier to execute a possible verification or correction manual of the data (below).

On a module acquired by filtering the color with which the boxes are designed the boxes is more easy to make a recognition, but it is inconvenient to perform a possible verification o correction manual of data (above). On the same modules on which was imprinted on overaly, after recognition, a empty form not filtering is much easier to execute a possible verification or correction manual of the data (below).
For more information on the OMR technology, it is worthwhile to know how and know our solutions that implement it, you can send us an e-mail to informazioni@recogniform.it or fill in the form below.


Company
Title
First Name
Last Name
Address
Zip Code
State
Country
Phone
Fax
E-mail
Message

Taking note of Information of the policy of personal data (D. Lgs 30 june 2003 n.196 and subsequent amendment and additions), click on the "OK" button i consent to collect, hold, process, communicate, and if appropriate, discontinue the treatment/s of personal data that concern me, for the purposes specified in the policy.