Home > Published Issues > 2026 > Volume 17, No. 2, 2026 >
JAIT 2026 Vol.17(2): 251-260
doi: 10.12720/jait.17.2.251-260

Violence Detection in Videos Using Discriminant Frame Extraction and Convolutional Long Short-Term Memory

Venkatesh Akula * and Ilaiah Kavati
Department of Computer Science and Engineering, National Institute of Technology, Warangal, Telangana, India
Email: avenkatesh@student.nitw.ac.in (V.A.); ilaiahkavati@nitw.ac.in (I.K.)
*Corresponding author

Manuscript received April 9, 2025; revised May 14, 2025; accepted July 16, 2025; published February 5, 2026.

Abstract—Violent activity detection in videos is a challenging task in computer vision due to the complex and diverse motion patterns present among human subjects in real-world environments. Automated surveillance systems are necessary in sensitive monitoring areas such as smart cities, educational institutions, sports grounds, hospitals, public gatherings, and other critical surveillance domains. This paper aims to investigate the effective combination of hand-crafted and deep learning features within a hierarchical framework to enhance the accuracy and efficiency of violent action detection in videos. In the first stage, the You Only Look Once (YOLO) object detection model is employed for person detection in video frames, thereby discarding frames without humans. A precise cropping operation is then performed on the resultant frames around the maximum overlapped human intersection area (i.e., the Region of Interest (ROI)). In the second stage, interest points from the cropped region are generated using optical flow. Histogram of Oriented Gradients (HOG) is then applied to extract features from the image data. These feature maps are forwarded to a customized Convolutional Long Short-Term Memory (ConvLSTM) deep neural network for efficient training. The model is evaluated on three benchmark datasets: Action Movies, hockey fights, and violent flows. The experimental results demonstrate that our method performs better than existing state-of-the-art approaches.
 
Keywords—violence detection, key frames, You Only Look Once (YOLO), Histogram of Oriented Gradients (HOG), Convolutional Long Short-Term Memory (ConvLSTM)

Cite: Venkatesh Akula and Ilaiah Kavati, "Violence Detection in Videos Using Discriminant Frame Extraction and Convolutional Long Short-Term Memory," Journal of Advances in Information Technology, Vol. 17, No. 2, pp. 251-260, 2026. doi: 10.12720/jait.17.2.251-260

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions