CS 315 Data Structures
Fall 2008
Project
#2
Due:
Oct 24, 2008
Problem Statement: Optical
character recognition (OCR) is a useful technology that has significantly
advanced the process of digitizing documents. It is also supported by scanners
and other devices. Using image processing techniques, a
OCR reads a scanned document and recognize the individual characters in it. A
first step in optical character recognition is to identify the boundaries of
each character found in the image. A bounding box is the smallest enclosing
rectangle that contains a particular character.
The goal of this project is to take as input an image like
in the left-hand side and produce as output the image on the right-side in
which the figure on the left has been processed to recognize where the
characters are and a bounding box is drawn around each character. You can
assume that the background image is (nearly) black and the letters are (nearly)
white. A
sample input/output is shown above in which the left-side image is the input
and the right-side is the output produced by the program.
Goals of the project:
Data Structure and Algorithm used:
The basic idea behind the solution is as
follows: When the search process encounters a white pixel, it tries to identify
all the pixels that are connected to this pixel. This search is similar to
finding the way through a maze. You can use a queue to solve this problem as
follows: start scanning the image row by row and each row from left to right
until you encounter a white pixel at (j,k) (j = row
number, k = column number). Then, you use a queue to collect all the pixels
that are reachable from this pixel. While reaching all these pixels, your
algorithm will perform two tasks: (a) it will mark them as visited and change
their color to green and (b) keep track of the bounding x and y values. When
the queue is empty, the bounding box can be drawn. Now the search continues to
find the next character in the image.
Outline of the algorithm you are to implement
is as follows:
Let the image width be w and the height be h;
for j = 1 to w do
for k = 1 to h do
visited[j,k] = false;
end for;
end for;
count = 0; Initialize Q;
// count is the number of symbols in the
image.
for j from 0 to w – 1 do
for k from 0 to h – 1 do
if (color[j,k] is white and
!visited[j,k]) then
count++;
lowx
= j; lowy = k; highx = j; highy = k;
insert( [j,k]) into Q;
visited[j,k] = true;
while (Q is not empty)
do
p = delete(Q);
x = p
-> getx();
y = p -> gety();
color[x,y] = green;
if x < lowx then lowx = x; end if;
if y < lowy then lowy = y; end if;
if x > highx then highx = x; end if;
if y > highy then highy = y; end if;
for each neighbor n of p do
if visited[n] is not true then
visited[n] = true;
if (color[n] is white) then
insert(n) into Q;
end if;
end if;
end for;
end while;
draw a box with (lowx-1,lowy-1) as NW and
(highx + 1, highy
+ 1) as SE boundary;
else visited[j,k] = true; end if;
end for;
end for;
Input: A scanned image
of a printed text. Your program should make a first pass over the image to
determine which pixels represent background, which one represent
the printed text and which represent noise.
Output: Your
program should print the number of distinct letters (or foreground images)
found. It should create an output image file whose name is user specified (see
the format below). In the output file, the background pixels of the input file
should be copied exactly and the color of the letters should be changed to
green, and there should be a red bounding box around each letter. Your output
should also include the number of lines in the page.
Example: If
the executable file produced by your C++ program is run, then with in.bmp as input image that is on the left column in
page 1, if we execute the statement
% run in.bmp out.bmp
the output on the command line should be:
4 1
The image out.bmp
will be the image on the right column in page 1.
Grading: