Applying sequence-to-sequence RNN models to IR-based bug localization

Lemons, Clayton Lindsay

Applying sequence-to-sequence RNN models to IR-based bug localization

Access full-text files

LEMONS-MASTERSREPORT-2016.pdf (1022.73 KB)

Date

2016-08

Authors

Lemons, Clayton Lindsay

Abstract

Bug localization is the resource intensive process of finding bugs. A considerable amount of time, effort, and money could be saved if this process was automated. Bug localization based on information retrieval (IR) is a static approach to automation that represents source code files as documents in a database and bug reports as queries. The bug localization approach described in this report is centered around the mental model that evolves in the minds of software developers as they work with a codebase. Using a sequence-to-sequence recurrent neural network (RNN), it may be possible to approximate this mental model by mapping the comments in source code (written in a natural language) to the source code itself (written in a programming language). The model can then be used to convert bug reports (also written in a natural language) to source token keywords for use in IR-based bug localization. The results of experimenting with several approaches to defining the mapping are presented. Although not up to par with the current state-of-the-art, the results show that there is potential in using a sequence-to-sequence RNN for IR-based bug localization.