Home
Author Guide
Editor Guide
Reviewer Guide
Published Issues
Special Issue
Introduction
Special Issues List
Sections and Topics
Sections
Topics
journal menu
Aims and Scope
Editorial Board
Indexing Service
Article Processing Charge
Open Access
Copyright and Licensing
Preservation and Repository Policy
Publication Ethics
Editorial Process
Contact Us
General Information
ISSN:
1798-2340 (Online)
Frequency:
Monthly
DOI:
10.12720/jait
Indexing:
ESCI (Web of Science)
,
Scopus
,
CNKI
,
etc
.
Acceptance Rate:
19%
APC:
500 USD
Average Days to Accept:
135 days
Journal Metrics:
Impact Factor 2022: 1.0
3.1
2022
CiteScore
49th percentile
Powered by
Editor-in-Chief
Prof. Kin C. Yow
University of Regina, Saskatchewan, Canada
I'm delighted to serve as the Editor-in-Chief of
Journal of Advances in Information Technology
.
JAIT
is intended to reflect new directions of research and report latest advances in information technology. I will do my best to increase the prestige of the journal.
What's New
2024-03-28
Vol. 15, No. 3 has been published online!
2024-02-26
The papers published in Vol. 15, Nos. 1&2 have been registered with Crossref.
2024-02-26
Vol. 15, No. 2 has been published online!
Home
>
Published Issues
>
2020
>
Volume 11, No. 4, November 2020
>
Comparison of Korean Preprocessing Performance according to Tokenizer in NMT Transformer Model
Geumcheol Kim and Sang-Hong Lee
Department of Computer Science & Engineering, Anyang University, Anyang-si, Republic of Korea
Abstract
—Mechanical translation using neural networks in natural language processing is making rapid progress. With the development of natural language processing model and tokenizer, accurate translation is becoming possible. In this paper, we will create a transformer model that shows high performance recently and compare the performance of English Korean according to tokenizer. We made a traditional neural network-based Neural Machine Translation (NMT) model using a transformer and compared the Korean translation results according to the tokenizer. The Byte Pair Encoding (BPE)-based Tokenizer showed a small vocabulary size and a fast learning speed, but due to the nature of Korean, the translation result was not good. The morphological analysis-based Tokenizer showed that the parallel corpus data is large and the vocabulary is large, the performance is higher regardless of the characteristics of the language.
Index Terms
—translation, tokenizer, neural machine translation, natural language processing, deep learning
Cite: Geumcheol Kim and Sang-Hong Lee, "Comparison of Korean Preprocessing Performance according to Tokenizer in NMT Transformer Model," Journal of Advances in Information Technology, Vol. 11, No. 4, pp. 228-232, November 2020. doi: 10.12720/jait.11.4.228-232
Copyright © 2020 by the authors. This is an open access article distributed under the Creative Commons Attribution License (
CC BY-NC-ND 4.0
), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.
6-SC030_Korea
PREVIOUS PAPER
Probability-Weighted Voting Ensemble Learning for Classification Model
NEXT PAPER
Fuzzy Classification Rules with FRvarPSO Using Various Methods for Obtaining Fuzzy Sets