Home > Published Issues > 2012 > Volume 3, No. 2, May 2012 >

Feature Optimization and Performance Evaluation of Machine Learning Algorithms for Identification of P2P Traffic

Sunil Agrawal1 and Balwinder S. Sohi2
1. University Institute of Engineering & Technology, Panjab University, Chandigarh, India
2. Campus Director, Surya World, Bapror, Patiala, Punjab, India

Abstract - P2P applications supposedly constitute a substantial proportion of today's Internet traffic. The ability to accurately identify different P2P applications in internet traffic is important to a broad range of network operations including application-specific traffic engineering, capacity planning, resource provisioning, service differentiation, etc. However, current P2P applications use several obfuscation techniques, including dynamic port numbers, port hopping, and encrypted payloads. As P2P applications continue to evolve, robust and effective methods are needed for identification of P2P applications. It is general practice to reduce the cost of classification by reducing the number of features, utilizing some feature selection algorithm. But such algorithms are highly data-dependent and do not yield good result when tried upon other data set. In this paper, we propose an optimized set of features and compare five supervised ML algorithms for identification of the P2P traffic. It is found that NBTree outperforms other ML algorithms with 96.6% precision and 99.7% recall, when they are trained and tested on the same data set. As far as training time is concerned, BayesNet is the best with precision and recall very close to that of NBTree.

Index Terms – Flow features, Feature selection, Machine learning (ML) algorithms, Traffic classification.

Cite: Sunil Agrawal and Balwinder S. Sohi, "‘Feature Optimization and Performance Evaluation of Machine Learning Algorithms for Identification of P2P Traffic," Journal of Advances in Information Technology, Vol. 3, No. 2, pp. 107-114, May, 2012.doi:10.4304/jait.3.2.107-114