Go To Content

Academy for the Judiciary, Ministry of Justice:Back to homepage


2022/12/30: A Phase II Pilot Study of Natural Language Processing for Judicial Data: An Exploration of the Association between Drug Use and Property Crime

  • Publication Date:
  • Last updated:2022-12-30
  • View count:114

  In response to the announcement of the Ministry of Justice's digital policy to create a "technology-enabled Ministry of Justice" in line with international trends in artificial intelligence, we aimed to create a blueprint for "big data analytics" to support criminal policy decisions, which is an important foundation for the government's promotion of the "New Generation Anti-drug Strategy 2.0", including the "Diversified Treatment and Social Rehabilitation" and "Recidivism Prevention Promotion Plan." In 2021, we collaborated with the Chinese Language and Technology Center of National Taiwan Normal University to develop an artificial intelligence model for automatic interpretation, identification, and coding of indictments for drug use offenses. To continue the 2021 research results of 2021 and improve the applicability of artificial intelligence models, this project extends the machine learning algorithm to criminal offenses and conduct natural language processing for drug use and theft indictments. In addition, to demonstrate the potential of future automatic interpretation models to explore drug-related crime issues from the perspective of scientific knowledge and Big Data, this study also analyzes the correlation between drug use and property crimes with the Criminal Policy and Crime Research Database using a multi-layer Bayesian analysis. The methods of this study include: 1. drug use and theft indictments text marking, and development of a word segmentation system; 2. Multilayer Bayesian analysis. The results of this study showed that: 1. The machine learning model was trained to automatically retrieve 19 features of drug and theft cases, and to retrieve the initial indictment feature information with regular expressions, and to supplement the shortcomings of the regular expression by using the BERT model. Through the collaboration of the two approaches, the machine was able to read the indictment type information, and some of the features can be retrieved in nearly 100% consistency with the manual marker, but some of the features need further development. 2. The cloud-based interface for manual tagging has preserved the structure rules of the eight features in the indictment, which makes tagging more convenient and effective.3 Bayesian analysis at multiple layers has shown that drug use and property crimes are related. If a drug user has been involved in a property crime, there is a 35% to 45% probability that he or she will commit another property crime in the subsequent period, whereas the probability that a pure drug user will commit a property crime in the subsequent period is about 11.7%. This result suggests that the probability of a drug user committing another property crime is significantly higher if he or she has already been involved in a property crime.

Keywords: Prosecution, indictment, artificial intelligence, Chinese word segmentation, theft, drug use, theft and drug use association

For the full paper, please visit: AI人工智慧司法應用第二階段先導研究-兼以探索毒品犯罪與財產犯罪之關聯性 (only in traditional Chinese version).

Go Top