Characterization of cybercrime in the department of Cundinamarca during the first half of 2021 through exploratory analysis and machine learning
Main Article Content
Taking into account the wide diffusion that data analytics has had in different application areas and considering the scarcity of specific datasets associated with cybercrime within open data strategies in Colombia, this article aims to characterize cybercrime in the department of Cundinamarca, through the use of exploratory analysis and machine learning techniques. The present research was developed through 4 methodological phases: data adequacy, exploratory data analysis, application of machine learning models and finally generation of value-added information. For the development of the proposed study, a dataset was formed from the dataset of 35,000 records published by the National Police in the open data portal of Colombia, which addresses high-impact crimes within the department of Cundinamarca and occurred during the first half of 2021. The cybercrime dataset has a total of 1513 records and includes attributes such as: day, quarter, municipality, area, victim, age and crime, so that at the exploratory analysis level, descriptive statistics methods were applied on the different attributes, while at the machine learning level, the association rules and clustering models were applied in order to determine respectively the relationship of the attributes with the type of crime, and the representative groups formed by relating the age with the type of crime and the municipality with the type of crime. The study developed allowed to demonstrate the usefulness and potential of data analytics techniques in the field of cybersecurity, in order to support decision making by the relevant authorities.
(1) Nicol DM. The Value of Useless Academic Research to the Cyberdefense of Critical Infrastructures. IEEE Secur Priv. 2020 Jan 1;18(1):4–7. https://doi.org/10.1109/MSEC.2019.2951835.
(2) Beuhring A, Salous K. Beyond blacklisting: Cyberdefense in the era of advanced persistent threats. IEEE Secur Priv. 2014 Sep 1;12(5):90–3. https://doi.org/10.1109/MSP.2014.86.
(3) Ospina Díaz MR, Sanabria Rangel PE. Desafíos nacionales frente a la ciberseguridad en el escenario global: un análisis para Colombia. Rev Crim. 2020;62(2):199–217.
(4) Ojeda Pérez J, Rincón Rodríguez F, Arias Flórez M, Daza Martínez L. Delitos informáticos y entorno jurídico vigente en Colombia. Cuad Contab. 2010;11(28):41–66.
(5) Vargas Borbúa R, Reyes Chicango RP, Recalde Herrera L. Ciberdefensa y ciberseguridad, más allá del mundo virtual: modelo ecuatoriano de gobernanza en ciberdefensa. URVIO - Rev Latinoam Estud Segur. 2017 Jun 29;(20):31–45. https://doi.org/10.17141/urvio.20.2017.2571.
(6) Reyna D, Olivera D. Las amenazas cibernéticas. In: 10 Temas de Ciberseguridad. Universidad de Xalapa; 2017. p. 49–72.
(7) Pereira T, Santos H, Mendes I. Challenges and reflections in designing Cyber security curriculum. EDUNINE 2017. Santos: IEEE World Eng Educ Conf Eng Educ - Balanc Gen Spec Form Technol Carriers A Curr Challenge, Proc. 2017 May 2;47–51. https://doi.org/10.1109/EDUNINE.2017.7918179.
(8) Ortiz-Campos N. Normativa Legal sobre Delitos Informáticos en Ecuador. Rev Científica Hallazgos21. 2019;4(1):100–11. Available from: http://revistas.pucese.edu.ec/hallazgos21/.
(9) Acosta MG, Benavides M, García N. Delitos informáticos: Impunidad organizacional y su complejidad en el mundo de los negocios. Rev Venez Gerenc. 2020;25(89):351–68. Available from: https://www.redalyc.org/articulo.oa?id=29062641023.
(10) Pons V. Internet, la nueva era del delito: ciberdelito, ciberterrorismo, legislación y ciberseguridad. URVIO - Rev Latinoam Estud Segur. 2017;(20):80–93. https://doi.org/10.17141/urvio.20.2017.2563.
(11) Urcuqui C, García M, Osorio JL, Navarro A. Ciberseguridad: Un enfoque desde la ciencia de datos. Universidad Icesi; 2018. p. 91. https://doi.org/10.18046/EUI/ee.4.2018.
(12) Coyac-Torres JE, Sidorov G, Aguirre-Anaya E. Detección de ciberataques a través del análisis de mensajes de redes sociales : revisión del estado del arte. Res Comput Sci. 2020;149(8):1031–41.
(13) Policia Nacional de Colombia. Dataset de delitos de alto impacto para el departamento de Cundinamarca. Available from: https://www.datos.gov.co/dataset/DELITOS-DE-ALTO-IMPACTO-EN-EL-DEPARTAMENTO-DE-CUND/7b35-j7bt/data.
(14) Nafie Ali FM, Mohamed Hamed AA. Usage Apriori and clustering algorithms in WEKA tools to mining dataset of traffic accidents. J Inf Telecommun. 2018;2(3):231–45. https://doi.org/10.1080/24751839.2018.1448205.
(15) Do Carmo Silva M, Simoes Gomes CF, Alves Lima GB. Utilities Analysis for Latin America and Caribbean Innovation Indicators. IEEE Lat Am Trans. 2018 Nov 1;16(11):2834–40. https://doi.org/10.1109/TLA.2018.8795126.
(16) Montoya EAQ, Colorado SFJ, Muñoz WYC, Chanchí G. Propuesta de una Arquitectura para Agricultura de Precisión Soportada en IoT. RISTI - Rev Iber Sist e Tecnol Inf [Internet]. 2017 [cited 2020 Aug 1];(24):39–56. Available from: http://www.scielo.mec.pt/scielo.php?pid=S1646-98952017000400005&script=sci_arttext&tlng=es.
(17) Anselin L, Syabri I, Kho Y. GeoDa: an introduction to spatial data analysis. In: Handbook of applied spatial analysis. Springer; 2010. p. 73–89.
(18) Wu Z, Zhang F, Di D, Wang H. Study of spatial distribution characteristics of river eco-environmental values based on emergy-GeoDa method. Sci Total Environ. 2022;802:149679. https://doi.org/10.1016/j.scitotenv.2021.149679.
(19) Yang S, Ge M, Li X, Pan C. The spatial distribution of the normal reference values of the activated partial thromboplastin time based on ArcGIS and GeoDA. Int J Biometeorol. 2020;64(5):779–90. https://doi.org/10.1007/s00484-020-01868-2.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors grant the journal and Universidad del Valle the economic rights over accepted manuscripts, but may make any reuse they deem appropriate for professional, educational, academic or scientific reasons, in accordance with the terms of the license granted by the journal to all its articles.
Articles will be published under the Creative Commons 4.0 BY-NC-SA licence (Attribution-NonCommercial-ShareAlike).