CS 315 Data Structures

 

Fall 2008

 

Project #2

 

Due: Oct 24, 2008

 

Problem Statement:  Optical character recognition (OCR) is a useful technology that has significantly advanced the process of digitizing documents. It is also supported by scanners and other devices. Using image processing techniques, a OCR reads a scanned document and recognize the individual characters in it. A first step in optical character recognition is to identify the boundaries of each character found in the image. A bounding box is the smallest enclosing rectangle that contains a particular character.

 

                                         

 

The goal of this project is to take as input an image like in the left-hand side and produce as output the image on the right-side in which the figure on the left has been processed to recognize where the characters are and a bounding box is drawn around each character. You can assume that the background image is (nearly) black and the letters are (nearly) white.  A sample input/output is shown above in which the left-side image is the input and the right-side is the output produced by the program.

            

 

Goals of the project:

 

 

Data Structure and Algorithm used: 

 

The basic idea behind the solution is as follows: When the search process encounters a white pixel, it tries to identify all the pixels that are connected to this pixel. This search is similar to finding the way through a maze. You can use a queue to solve this problem as follows: start scanning the image row by row and each row from left to right until you encounter a white pixel at (j,k) (j = row number, k = column number). Then, you use a queue to collect all the pixels that are reachable from this pixel. While reaching all these pixels, your algorithm will perform two tasks: (a) it will mark them as visited and change their color to green and (b) keep track of the bounding x and y values. When the queue is empty, the bounding box can be drawn. Now the search continues to find the next character in the image.

 

Outline of the algorithm you are to implement is as follows:

 

Let the image width be w and the height be h;

for j = 1 to w do

 for k = 1 to h do

    visited[j,k] = false;

 end for;

end for;

 

count = 0; Initialize Q;

 // count is the number of symbols in the image.

for j from 0 to w – 1 do

 for k from 0 to h – 1 do

  if (color[j,k] is white and !visited[j,k]) then

    count++;

    lowx = j; lowy = k; highx = j; highy = k;

    insert( [j,k]) into Q;

    visited[j,k] = true;

while (Q is not empty) do

       p = delete(Q);

       x = p -> getx(); y = p -> gety();

       color[x,y] = green;

       if x < lowx then lowx = x; end if;

       if y < lowy then lowy = y; end if;

       if x > highx then highx = x; end if;

       if y > highy then highy = y; end if;

       for each neighbor n of p do

         if visited[n] is not true then

           visited[n] = true;

           if (color[n] is white) then

               insert(n) into Q;

           end if;

         end if;

       end for;

   end while;

     draw a box with (lowx-1,lowy-1) as NW and

         (highx + 1, highy + 1) as SE boundary;

  else visited[j,k] = true; end if;

 end for;

end for;

 

Input: A scanned image of a printed text. Your program should make a first pass over the image to determine which pixels represent background, which one represent the printed text and which represent noise.

 

Output:  Your program should print the number of distinct letters (or foreground images) found. It should create an output image file whose name is user specified (see the format below). In the output file, the background pixels of the input file should be copied exactly and the color of the letters should be changed to green, and there should be a red bounding box around each letter. Your output should also include the number of lines in the page.

 

Example: If the executable file produced by your C++ program is run, then with in.bmp as input image that is on the left column in page 1, if we execute the statement

  % run in.bmp out.bmp

the output on the command line should be:

   4 1

 

 The image out.bmp will be the image on the right column in page 1.

 

Grading: