Visual Question Answering in the Medical Domain

Sharma, Dhruv

Visual Question Answering in the Medical Domain

Files

Sharma_D_T_2020.pdf (4.99 MB)

Downloads: 274

Date

2020-07-21

Authors

Sharma, Dhruv

Publisher

Virginia Tech

Abstract

Medical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this thesis, we develop a deep learning-based model for VQA on medical images taking the associated challenges into account. Our MedFuseNet system aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and weaving everything together to predict the answer. We tackle two types of answer prediction - categorization and generation. We conduct an extensive set of both quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks.

Keywords

Visual Question Answering, deep learning, medical images

Persistent link

http://hdl.handle.net/10919/107586

Collections

Masters Theses

Full item page