Home > Published Issues > 2023 > Volume 14, No. 5, 2023 >
JAIT 2023 Vol.14(5): 1056-1062
doi: 10.12720/jait.14.5.1056-1062

CRL+: A Novel Semi-Supervised Deep Active Contrastive Representation Learning-Based Text Classification Model for Insurance Data

Amir Namavar Jahromi 1, Ebrahim Pourjafari 2,*, Hadis Karimipour 1, Amit Satpathy 2, and Lovell Hodge 2
1. Schulich School of Engineering, University of Calgary, Alberta, Canada;
Email: amir.namavarjahromi@ucalgary.ca (A.N.J.), hadis.karimipour@ucalgary.ca (H.K.)
2. Munich Re Canada, Canada; Email: ASatpathy@munichre.ca (A.S.), LHodge@munichre.ca (L.H.)
*Correspondence: EPourjafari@munichre.ca (E.P.)

Manuscript received March 6, 2023; revised April 17, 2023; accepted May 24, 2023; published October 13, 2023.

Abstract—Financial sector and especially the insurance industry collect vast volumes of text on a daily basis and through multiple channels (their agents, customer care centers, emails, social networks, and web in general). The information collected includes policies, expert and health reports, claims and complaints, results of surveys, and relevant social media posts. It is difficult to effectively extract labels, classify, and interpret the essential information from such varied and unstructured material. Therefore, the Insurance Industry is among the ones that can benefit from applying technologies for the intelligent analysis of free text through Natural Language Processing (NLP). In this paper, CRL+, a novel text classification model combining Contrastive Representation Learning (CRL) and Active Learning is proposed to handle the challenge of using semi-supervised learning for text classification. In this method, supervised (CRL) is used to train a RoBERTa transformer model to encode the textual data into a contrastive representation space and then classify using a classification layer. This CRL-based transformer model is used as the base model in the proposed Active Learning mechanism to classify all the data in an iterative manner. The proposed model is evaluated using unstructured obituary data with objective to determine the cause of the death from the data. This model is compared with the CRL model and an Active Learning model with the RoBERTa base model. The experiment shows that the proposed method can outperform both methods for this specific task.
 
Keywords—natural language processing, contrastive representation learning, active learning, text classification, transformers, CRL+

Cite: Amir Namavar Jahromi, Ebrahim Pourjafari, Hadis Karimipour, Amit Satpathy, and Lovell Hodge, "CRL+: A Novel Semi-Supervised Deep Active Contrastive Representation Learning-Based Text Classification Model for Insurance Data," Journal of Advances in Information Technology, Vol. 14, No. 5, pp. 1056-1062, 2023.

Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.