An Empirical Investigation on Fine-Grained Syndrome Segmentation in TCM by Learning a CRF from a Noisy Labeled Data

Yaqiang Wang 1, Dan Tang 1, Hongping Shu 1, and Chen Su 2
1. College of Software Engineering, Chengdu University of Information Technology, Chengdu, China
2. Sichuan Academy of Chinese Medicine Sciences, Chengdu, China
Abstract—Syndrome is an important component in Traditional Chinese Medicine (TCM), and it is also a distinctive concept in TCM compared with Western Medicine (WM). Clearly understand the TCM syndrome help researchers digest TCM regularities and bridge TCM and WM. Syndromes are often used in coarse-grained forms, however fine-grained medical information buried in the coarse-grained TCM syndromes would not be considered. In this paper, we empirically investigate Fine-Grained Syndrome Segmentation (FGSS) in TCM by a distantly supervised method to build a noisy labeled data for training CRFs for FGSS in TCM. The feasibility and effectiveness of the method are demonstrated based on a series of elaborate experiments. The best F1-score can reach 0.9177. To the best of our knowledge, our work is the first to focus on fine-grained information extraction in Chinese medical texts.
Index Terms—information extraction, distant supervision, biomedical natural language processing, traditional Chinese medicine

Cite: Yaqiang Wang, Dan Tang, Hongping Shu, and Chen Su, "An Empirical Investigation on Fine-Grained Syndrome Segmentation in TCM by Learning a CRF from a Noisy Labeled Data," Vol. 9, No. 2, pp. 45-50, May 2018. doi: 10.12720/jait.9.2.45-50
