ETRO-VUB Department of Electronics and Informatics

About ETRO | News | Events | Vacancies | Contact

Home

Research

Master theses

Current and past ideas and concepts for Master Theses.


	Large Language Models for Automated Clinical Coding

	Subject When healthcare providers treat patients, they document their findings and treatments in medical notes. Then, a team of medical coders reviews these patient files and translates the documented medical services, diagnoses, procedures, and equipment into a set of universal medical alphanumeric codes known as the International Classification of Diseases (ICD-10). These codes are essential for billing and reimbursement purposes. However, manual clinical coding is both challenging and time-consuming, with a significant risk of human error. Automation offers a solution by accelerating this process, saving time and resources, and reducing the risk of errors. By employing AI algorithms, specifically Large Language Models (LLMs), we can automatically read through medical notes and accurately identify and assign the appropriate ICD-10 codes. This not only improves efficiency but also ensures a higher degree of accuracy and consistency in clinical coding. Kind of work Aspect 1: Development and Investigation of LLMs for Automated Clinical Coding This aspect focuses on developing approaches that use Large Language Models (LLMs) for accurate automated clinical coding, leveraging their potential to enhance efficiency and accuracy in Natural Language Processing (NLP) tasks. Additionally, the student will explore how LLMs encode clinical terminologies into ICD-10 codes, with the objective of creating interpretable models that not only make accurate predictions but also provide clear explanations for their decisions. By integrating explainability with the human-in-the-loop concept, we aim to ensure both transparency and trust in the coding process. Aspect 2: Disease Outbreak Detection and Characterization from Clinical Notes The second aspect involves applying automated clinical coding to detect and characterize disease outbreaks from clinical notes. The rationale is to utilize the predicted clinical codes to build models that can predict possible disease cases. This approach is inspired by the work of [1], which describes a Bayesian framework linking individual clinical diagnoses to epidemiological modeling of disease outbreaks. The final objective is the development of a system that can detect and characterize outbreaks, providing valuable early warnings and insights into disease patterns. Framework of the Thesis This thesis will build upon a long-standing collaboration with the clinical coding team of the hospital UZ Brussel. The ETRO department has a lot of expertise (including code, data, and know-how on this problem) due to various strategic projects in the domain. Some relevant pablications are listed below: [1] Cooper, G. F., Villamarin, R., Tsui, F. C. R., Millett, N., Espino, J. U., & Wagner, M. M. (2015). A method for detecting and characterizing outbreaks of infectious disease from clinical reports. Journal of Biomedical Informatics, 53, 15-26. [2] Dong, H., Falis, M., Whiteley, W. et al. Automated clinical coding: what, why, and where we are?. npj Digit. Med. 5, 159 (2022). https://doi.org/10.1038/s41746-022-00705-7 [3] Soha S. Mahdi, E. Papagiannopoulou, N. Deligiannis, H. Sahli. Co-occurrence Graph-Enhanced Hierarchical Prediction of ICD Codes. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. [4] Soroush, A., Glicksberg, B. S., Zimlichman, E., Barash, Y., Freeman, R., Charney, A. W., & Klang, E. (2024). Large Language Models Are Poor Medical Coders—Benchmarking of Medical Code Querying. NEJM AI, AIdbp2300040. Number of Students 1 or 2 (per aspect of the work) Expected Student Profile • Strong programming skills (Python). • Experience with Natural Language Processing (NLP) and Deep Learning (DL). This thesis is structured to accommodate two students, each focusing on one of the aspects mentioned above. One student will develop and investigate LLMs for automated clinical coding, while the other will work on disease outbreak detection and characterization using clinical coding data.

Promotors

Prof. Dr. Ir. Nikos Deligiannis

+32 (0)2 629 1683

ndeligia@etrovub.be

more info

Prof. Hichem Sahli

+32 (0)2 629 2916

hsahli@etrovub.be

more info

Supervisor

Dr. Eirini Papagiannopoulou

+32 (0)2 629 2930

epapagia@etrovub.be

more info

	Image

	Dashed arrows between clinical coders and the automated coding system suggest potential interactions between them, while this is yet to be considered in many clinical coding systems. From [2].


Research - Contact person - IRIS - AVSP - LAMI	Education - Contact person - Thesis proposals - ETRO Courses	Industry - Contact person - Spin-offs - Know How	Publications - Journals - Conferences - Books	About ETRO - Vacancies - News - Events - Press	Contact ETRO Department Tel: +32 2 629 29 30


©2024 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • Webmaster • Disclaimer