|
Subject
Advancements in computer vision have enabled the automation of increasingly complex tasks in challenging environments, from industry 4.0 to robots. Interpretable deep learning is essential for AI applications in computer vision. A particular family of interpretable-by-design models is deep unfolding. Deep unfolding unrolls an optimization algorithm and maps the (sub)steps to corresponding neural network layers, to obtain a machine learning model that incorporates the domain knowledge from the original algorithm into its architecture. This approach results in very compact and efficient models, with many use cases in signal and image processing.
Kind of work
For this thesis, you will work with the deep unfolding models developed in our research group towards computer vision and detection tasks. You will start with an exploration of the current state-of-the-art in computer vision and the selection of a suitable dataset. Next, the task is to develop a deep unfolding based vision or detection model for a single modality and evaluate its performance. Subsequently, you will fuse different modalities into a single representation, again using deep unfolding principles, and similarly evaluate the costs and benefits of using multiple modalities for robotic vision.
Framework of the Thesis
The thesis derives from long-standing research at the ETRO department around interpretable and explainable deep learning. The student will be given access to scientific datasets about robotic vision and Python code that implements state-of-the-art deep unfolding models. The student will need to expand these models into new data modalities and AI architectures. Example publications: - B. Joukovsky, Y. C. Eldar, N. Deligiannis, Interpretable Neural Networks for Video Separation: Deep Unfolding RPCA with Foreground Masking. IEEE Transactions on Image Processing, 2024. 10.1109/TIP.2023.3336176 - B. De Weerdt, Y. C. Eldar, N. Deligiannis. Deep Unfolding Transformers for Sparse Recovery of Video. IEEE Transactions on Signal Processing (TSP), 2024.
Number of Students
1
Expected Student Profile
Good knowledge of Python (including PyTorch), machine learning, and signal processing. Knowledge of image processing and computer vision methods is a plus.
|
|