ETRO VUB
About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

Master theses

Current and past ideas and concepts for Master Theses.

Large Language Models for Automated Clinical Coding

Subject

When healthcare providers treat patients, they document their findings and treatments in medical notes. Then, a team of medical coders reviews these patient files and translates the documented medical services, diagnoses, procedures, and equipment into a set of universal medical alphanumeric codes known as the International Classification of Diseases (ICD-10). These codes are essential for billing and reimbursement purposes. However, manual clinical coding is both challenging and time-consuming, with a significant risk of human error. Automation offers a solution by accelerating this process, saving time and resources, and reducing the risk of errors. By employing AI algorithms, specifically Large Language Models (LLMs), we can automatically read through medical notes and accurately identify and assign the appropriate ICD-10 codes. This not only improves efficiency but also ensures a higher degree of accuracy and consistency in clinical coding.

Kind of work

Aspect 1: Development and Investigation of LLMs for Automated Clinical Coding
This aspect focuses on developing approaches that use Large Language Models (LLMs) for accurate automated clinical coding, leveraging their potential to enhance efficiency and accuracy in Natural Language Processing (NLP) tasks. Additionally, the student will explore how LLMs encode clinical terminologies into ICD-10 codes, with the objective of creating interpretable models that not only make accurate predictions but also provide clear explanations for their decisions. By integrating explainability with the human-in-the-loop concept, we aim to ensure both transparency and trust in the coding process.

Aspect 2: Disease Outbreak Detection and Characterization from Clinical Notes
The second aspect involves applying automated clinical coding to detect and characterize disease outbreaks from clinical notes. The rationale is to utilize the predicted clinical codes to build models that can predict possible disease cases. This approach is inspired by the work of [1], which describes a Bayesian framework linking individual clinical diagnoses to epidemiological modeling of disease outbreaks. The final objective is the development of a system that can detect and characterize outbreaks, providing valuable early warnings and insights into disease patterns.

Framework of the Thesis

This thesis will build upon a long-standing collaboration with the clinical coding team of the hospital UZ Brussel. The ETRO department has a lot of expertise (including code, data, and know-how on this problem) due to various strategic projects in the domain.

Some relevant pablications are listed below:
[1] Cooper, G. F., Villamarin, R., Tsui, F. C. R., Millett, N., Espino, J. U., & Wagner, M. M. (2015). A method for detecting and characterizing outbreaks of infectious disease from clinical reports. Journal of Biomedical Informatics, 53, 15-26.
[2] Dong, H., Falis, M., Whiteley, W. et al. Automated clinical coding: what, why, and where we are?. npj Digit. Med. 5, 159 (2022). https://doi.org/10.1038/s41746-022-00705-7
[3] Soha S. Mahdi, E. Papagiannopoulou, N. Deligiannis, H. Sahli. Co-occurrence Graph-Enhanced Hierarchical Prediction of ICD Codes. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[4] Soroush, A., Glicksberg, B. S., Zimlichman, E., Barash, Y., Freeman, R., Charney, A. W., & Klang, E. (2024). Large Language Models Are Poor Medical Coders—Benchmarking of Medical Code Querying. NEJM AI, AIdbp2300040.

Number of Students

1 or 2 (per aspect of the work)

Expected Student Profile

• Strong programming skills (Python).
• Experience with Natural Language Processing (NLP) and Deep Learning (DL).

This thesis is structured to accommodate two students, each focusing on one of the aspects mentioned above. One student will develop and investigate LLMs for automated clinical coding, while the other will work on disease outbreak detection and characterization using clinical coding data.

Promotors

Prof. Dr. Ir. Nikos Deligiannis

+32 (0)2 629 1683

ndeligia@etrovub.be

more info

Prof. Hichem Sahli

+32 (0)2 629 2916

hsahli@etrovub.be

more info

Supervisor

Dr. Eirini Papagiannopoulou

+32 (0)2 629 2930

epapagia@etrovub.be

more info

Image

Dashed arrows between clinical coders and the automated coding system suggest potential interactions between them, while this is yet to be considered in many clinical coding systems. From [2].

- Contact person

- IRIS

- AVSP

- LAMI

- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press

Contact

ETRO Department

Tel: +32 2 629 29 30

©2024 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer