ES 314 Advanced Programming,
Simulation and Modeling
Fall 2008
Project
#2
Due:
Oct 13, 2008
Problem Statement: Optical
character recognition (OCR) is a useful technology that has significantly
advanced the process of digitizing documents. It is also supported by scanners
and other devices. Using image processing techniques, a OCR reads a
scanned document and recognize the individual characters in it. A first step in
optical character recognition is to identify the boundaries of each character
found in the image. A bounding box is the smallest enclosing rectangle that
contains a particular character.
The goal of this project is to take as input an image like
in the left-hand side and produce as output the image on the right-side in
which the figure on the left has been processed to recognize where the
characters are and a bounding box is drawn around each character. You can assume
that the background image is (nearly) black and the letters are (nearly)
white. A
sample input/output is shown above in which the left-side image is the input
and the right-side is the output produced by the program.
Goals of the project:
Data Structure and Algorithm used:
The basic idea behind the solution is as
follows: When the search process encounters a white pixel, it tries to identify
all the pixels that are connected to this pixel. This search is similar to
finding the way through a maze. You can use a queue to solve this problem as
follows: start scanning the image row by row and each row from left to right
until you encounter a white pixel at (j,k) (j = row number, k = column number).
Then, you use a queue to collect all the pixels that are reachable from this
pixel. While reaching all these pixels, your algorithm will perform two tasks:
(a) it will mark them as visited and change their color to green and (b) keep
track of the bounding x and y values. When the queue is empty, the bounding box
can be drawn. Now the search continues to find the next character in the image.
Outline of the algorithm you are to implement
is as follows:
Let the image width be w and the height be h;
for j = 1 to w do
for k = 1 to
h do
visited[j,k] = false;
end for;
end for;
count = 0; Initialize Q;
// count is the number of symbols in the
image.
for j from 0 to w – 1 do
for k from 0
to h – 1 do
if
(color[j,k] is white and !visited[j,k]) then
count++;
lowx = j;
lowy = k; highx = j; highy = k;
insert(
[j,k]) into Q;
visited[j,k] = true;
while (Q is not empty) do
p =
delete(Q);
x = p
-> getx(); y = p -> gety();
color[x,y] = green;
if x
< lowx then lowx = x; end if;
if y
< lowy then lowy = y; end if;
if x
> highx then highx = x; end if;
if y
> highy then highy = y; end if;
for
each neighbor n of p do
if
visited[n] is not true then
visited[n] = true;
if
(color[n] is white) then
insert(n) into Q;
end
if;
end
if;
end
for;
end while;
draw a
box with (lowx-1,lowy-1) as NW and
(highx + 1, highy + 1) as SE boundary;
else
visited[j,k] = true; end if;
end for;
end for;
Input: The input will be an image with nearly
black background and non-black pixels forming letters or foreground images. You
can assume that the background pixels are such that each of their R, G and B
components < 20. The foreground
images are therefore those pixels of which at least one color component is
>= 20. Typical input is one in which
all the pixels in the foreground are close to white.
Output: Your
program should print the number of distinct letters (or foreground images)
found. It should create an output image file whose name is user specified (see
the format below). In the output file, the background pixels of the input file
should be copied exactly and the color of the letters should be changed to
green, and there should be a red bounding box around each letter.
Example: If
the executable file produced by your C++ program is run, then with in.bmp
as input image that is on the left column in page 1, if we execute the
statement
% run in.bmp out.bmp
the output on the command line should be:
4
The image out.bmp will be the image on
the right column in page 1.
Grading: