Proceedings of the
35th European Safety and Reliability Conference (ESREL2025) and
the 33rd Society for Risk Analysis Europe Conference (SRA-E 2025)
15 – 19 June 2025, Stavanger, Norway
A Semi-Automated Framework for Coding Fatal Accident Data in Mines Using Natural Language Processing
1Department of Mining Engineering, Indian Institute of Technology Kharagpur, India.
2Talaipalli Coal Mining Project, NTPC Ltd, India.
ABSTRACT
Mining is among the most hazardous industries, with frequent fatalities resulting from various occupational hazards. Traditionally, identifying the causes of such fatalities has relied on manual coding of accident reports, which is time-consuming, inconsistent, and prone to human error. With the increasing volume of accident reports, particularly in data-intensive environments, automation is crucial for timely safety interventions. Advances in Natural Language Processing (NLP) and Machine Learning (ML) provide promising solutions for semi-automated coding, reducing manual effort while improving accuracy. This study utilizes NLP and ML models to predict the causes of fatalities in Indian mines using accident data from the Directorate General of Mines Safety (DGMS) reports from 2016 to 2022. The dataset consists of 401 fatal accident descriptions spanning seven years. Accident descriptions were preprocessed and vectorized using the Bag of Words approach. Five machine learning models were compared: Naïve Bayes, Logistic Regression with and without adjusted weights, Support Vector Machines, and Random Forest. Each model was trained to predict accident causes based on textual descriptions. The models were assessed based on their accuracy in classification, using an 80/20 train-test split for validation. The study utilized a semi-automatic classification approach. Instances with a high-confidence classification (above a predefined probability threshold) are assigned automatically, while lower-confidence cases are flagged for manual review. Conversely, if the maximum probability is below the threshold, the instance is filtered for manual review. Among the models evaluated, Logistic Regression with Adjusted Weights outperformed the standard Logistic Regression model with a precision of 80%, a recall of 83%, and an F1-score of 80%, demonstrating its robustness in handling imbalanced data and effectively identifying positive cases. This approach significantly reduces manual coding workload, accelerates data processing, and strengthens safety oversight in mining operations.
Keywords: Occupational safety, Predictive modelling, Mining hazard mitigation.