|
Subject
When healthcare providers treat patients, they document their findings and treatments in medical notes. Then, a team of medical coders reviews these patient files and translates the documented medical services, diagnoses, procedures, and equipment into a set of universal medical alphanumeric codes known as the International Classification of Diseases (ICD-10). These codes are essential for billing and reimbursement purposes. However, manual clinical coding is both challenging and time-consuming, with a significant risk of human error. Automation offers a solution by accelerating this process, saving time and resources, and reducing the risk of errors. By employing AI algorithms, specifically Large Language Models (LLMs), we can automatically read through medical notes and accurately identify and assign the appropriate ICD-10 codes. This not only improves efficiency but also ensures a higher degree of accuracy and consistency in clinical coding.
Kind of work
Aspect 1: Development and Investigation of LLMs for Automated Clinical Coding This aspect focuses on developing approaches that use Large Language Models (LLMs) for accurate automated clinical coding, leveraging their potential to enhance efficiency and accuracy in Natural Language Processing (NLP) tasks. Additionally, the student will explore how LLMs encode clinical terminologies into ICD-10 codes, with the objective of creating interpretable models that not only make accurate predictions but also provide clear explanations for their decisions. By integrating explainability with the human-in-the-loop concept, we aim to ensure both transparency and trust in the coding process.
Aspect 2: Disease Outbreak Detection and Characterization from Clinical Notes The second aspect involves applying automated clinical coding to detect and characterize disease outbreaks from clinical notes. The rationale is to utilize the predicted clinical codes to build models that can predict possible disease cases. This approach is inspired by the work of [1], which describes a Bayesian framework linking individual clinical diagnoses to epidemiological modeling of disease outbreaks. The final objective is the development of a system that can detect and characterize outbreaks, providing valuable early warnings and insights into disease patterns.
Framework of the Thesis
This thesis will build upon a long-standing collaboration with the clinical coding team of the hospital UZ Brussel. The ETRO department has a lot of expertise (including code, data, and know-how on this problem) due to various strategic projects in the domain.
Some relevant pablications are listed below: [1] Cooper, G. F., Villamarin, R., Tsui, F. C. R., Millett, N., Espino, J. U., & Wagner, M. M. (2015). A method for detecting and characterizing outbreaks of infectious disease from clinical reports. Journal of Biomedical Informatics, 53, 15-26. [2] Dong, H., Falis, M., Whiteley, W. et al. Automated clinical coding: what, why, and where we are?. npj Digit. Med. 5, 159 (2022). https://doi.org/10.1038/s41746-022-00705-7 [3] Soha S. Mahdi, E. Papagiannopoulou, N. Deligiannis, H. Sahli. Co-occurrence Graph-Enhanced Hierarchical Prediction of ICD Codes. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. [4] Soroush, A., Glicksberg, B. S., Zimlichman, E., Barash, Y., Freeman, R., Charney, A. W., & Klang, E. (2024). Large Language Models Are Poor Medical CodersBenchmarking of Medical Code Querying. NEJM AI, AIdbp2300040.
Number of Students
1 or 2 (per aspect of the work)
Expected Student Profile
Strong programming skills (Python). Experience with Natural Language Processing (NLP) and Deep Learning (DL).
This thesis is structured to accommodate two students, each focusing on one of the aspects mentioned above. One student will develop and investigate LLMs for automated clinical coding, while the other will work on disease outbreak detection and characterization using clinical coding data.
|
|