Home > Published Issues > 2017 > Volume 8, No. 2, May 2017 >

Building a Learning Machine Classifier with Inadequate Data for Crime Prediction

Trung T. Nguyen, Amartya Hatua, and Andrew H. Sung
School of Computing, the University of Southern Mississippi, Hattiesburg, MS, U.S.A.

Abstract—In this paper, we describe a crime predicting method which forecasts the types of crimes that will occur based on location and time. In the proposed method the crime forecasting is done for the jurisdiction of Portland Police Bureau (PPB). The method comprises the following steps: data acquisition and pre-processing, linking data with demographic data from various public sources, and prediction using machine learning algorithms. In the first step, data pre-processing is done mainly by cleaning the dataset, formatting, inferring and categorizing. The dataset is then supplemented with additional publicly available census data, which mainly provides the demographic information of the area, educational background, economical and ethnic background of the people involved; thereby some of the very important features are imported to the dataset provided by PPB in statistically meaningful ways, which contribute to achieving better performance. Under sampling techniques are used to deal with the imbalanced dataset problem. Finally, the entire data is used to forecast the crime type in a particular location over a period of time using different machine learning algorithms including Support Vector Machine (SVM), Random Forest, Gradient Boosting Machines, and Neural Networks. Finally, the results are compared.

Index Terms—data mining, learning machine classifier models, missing features, random forest, gradient boosting, SVM, neural networks

Cite: Trung T. Nguyen, Amartya Hatua, and Andrew H. Sung, "School of Computing, the University of Southern Mississippi, Hattiesburg, MS, U.S.A.," Vol. 8, No. 2, pp. 141-147, May, 2017. doi: 10.12720/jait.8.2.141-147