ES 314 Advanced Programming, Simulation and Modeling                                                                              

 

Fall 2008

 

Project #2

 

Due: Oct 13, 2008

 

Problem Statement:  Optical character recognition (OCR) is a useful technology that has significantly advanced the process of digitizing documents. It is also supported by scanners and other devices. Using image processing techniques, a OCR reads a scanned document and recognize the individual characters in it. A first step in optical character recognition is to identify the boundaries of each character found in the image. A bounding box is the smallest enclosing rectangle that contains a particular character.

 

                                         

 

The goal of this project is to take as input an image like in the left-hand side and produce as output the image on the right-side in which the figure on the left has been processed to recognize where the characters are and a bounding box is drawn around each character. You can assume that the background image is (nearly) black and the letters are (nearly) white.  A sample input/output is shown above in which the left-side image is the input and the right-side is the output produced by the program.

            

 

Goals of the project:

 

 

Data Structure and Algorithm used: 

 

The basic idea behind the solution is as follows: When the search process encounters a white pixel, it tries to identify all the pixels that are connected to this pixel. This search is similar to finding the way through a maze. You can use a queue to solve this problem as follows: start scanning the image row by row and each row from left to right until you encounter a white pixel at (j,k) (j = row number, k = column number). Then, you use a queue to collect all the pixels that are reachable from this pixel. While reaching all these pixels, your algorithm will perform two tasks: (a) it will mark them as visited and change their color to green and (b) keep track of the bounding x and y values. When the queue is empty, the bounding box can be drawn. Now the search continues to find the next character in the image.

 

Outline of the algorithm you are to implement is as follows:

 

Let the image width be w and the height be h;

for j = 1 to w do

 for k = 1 to h do

    visited[j,k] = false;

 end for;

end for;

 

count = 0; Initialize Q;

 // count is the number of symbols in the image.

for j from 0 to w – 1 do

 for k from 0 to h – 1 do

  if (color[j,k] is white and !visited[j,k]) then

    count++;

    lowx = j; lowy = k; highx = j; highy = k;

    insert( [j,k]) into Q;

    visited[j,k] = true;

while (Q is not empty) do

       p = delete(Q);

       x = p -> getx(); y = p -> gety();

       color[x,y] = green;

       if x < lowx then lowx = x; end if;

       if y < lowy then lowy = y; end if;

       if x > highx then highx = x; end if;

       if y > highy then highy = y; end if;

       for each neighbor n of p do

         if visited[n] is not true then

           visited[n] = true;

           if (color[n] is white) then

               insert(n) into Q;

           end if;

         end if;

       end for;

   end while;

     draw a box with (lowx-1,lowy-1) as NW and

         (highx + 1, highy + 1) as SE boundary;

  else visited[j,k] = true; end if;

 end for;

end for;

 

Input: The input will be an image with nearly black background and non-black pixels forming letters or foreground images. You can assume that the background pixels are such that each of their R, G and B components  < 20. The foreground images are therefore those pixels of which at least one color component is >= 20.  Typical input is one in which all the pixels in the foreground are close to white.

 

Output:  Your program should print the number of distinct letters (or foreground images) found. It should create an output image file whose name is user specified (see the format below). In the output file, the background pixels of the input file should be copied exactly and the color of the letters should be changed to green, and there should be a red bounding box around each letter.

 

Example: If the executable file produced by your C++ program is run, then with in.bmp as input image that is on the left column in page 1, if we execute the statement

  % run in.bmp out.bmp

the output on the command line should be:

   4

 

 The image out.bmp will be the image on the right column in page 1.

 

Grading: