Home > Published Issues > 2017 > Volume 8, No. 1, February 2017 >

Mathematical Formula Identification in Printed- Chinese Documents Based on EEN Feature Function

Chunning Hou 1, Hongyan Ma 2, Bingjie Tian 3, Lina Zuo 4, and Xuedong Tian 4
1. School of Computer Science and Technology, Hebei University, Baoding, China
2. College of Mathematics and Information Science, Hebei University, Baoding, China
3. Department of Economic Trade, Hebei Finance University, Baoding, China
4. School of Computer Science and Technology, Hebei University, Baoding, China

Abstract—Aiming at the problems of mathematical formula identification in printed Chinese document images, a method based on EEN feature function is proposed. First, the EEN (Edge to Edge Notation) feature function which

reflects the changing situation of connected components is
defined, and corresponding algorithm which can extract the function value of image features is designed. Then, the characteristics of EEN feature function that it can reflect the distributions of images in horizontal and vertical directions intuitively and adequately is utilized to realize the layout analysis on symbol level and the basic information extraction of text lines. Finally, a locating method of isolated formulae and embedded formulae in printed Chinese document images is designed by using both the layout features and the content features of mathematical formulae. The experimental results show that this method can avoid the problems of the existing methods that their location accuracy is frequently affected by the symbol parts obtained from connected areas. It has good ability in layout composition discrimination and formula identification.
 
Index Terms—printed Chinese document images, isolated formula identification, embedded formula identification,
connected components, EEN feature function

Cite: Chunning Hou, Hongyan Ma, Bingjie Tian, Lina Zuo, Xuedong Tian, "Mathematical Formula Identification in Printed- Chinese Documents Based on EEN Feature Function," Vol. 8, No. 1, pp. 29-35, February, 2017. doi: 10.12720/jait.8.1.29-35