Utilize este identificador para referenciar este registo: http://hdl.handle.net/10451/51973
Título: Improving Machine Learning Pipeline Creation using Visual Programming and Static Analysis
Autor: David, João Pedro Vieira
Orientador: Fonseca, Alcides Miguel Cachulo Aguiar
Palavras-chave: Programação Visual
Aprendizagem Automática
Pipeline
Verificação de Tipos
Compilador
Teses de mestrado - 2021
Data de Defesa: 2021
Resumo: ML pipelines are composed of several steps that load data, clean it, process it, apply learning algorithms and produce either reports or deploy inference systems into production. In real-world scenarios, pipelines can take days, weeks, or months to train with large quantities of data. Unfortunately, current tools to design and orchestrate ML pipelines are oblivious to the semantics of each step, allowing developers to easily introduce errors when connecting two components that might not work together, either syntactically or semantically. Data scientists and engineers often find these bugs during or after the lengthy execution, which decreases their productivity. We propose a Visual Programming Language (VPL) enriched with semantic constraints regarding the behavior of each component and a verification methodology that verifies entire pipelines to detect common ML bugs that existing visual and textual programming languages do not. We evaluate this methodology on a set of six bugs taken from a data science company focused on preventing financial fraud on big data. We were able detect these data engineering and data balancing bugs, as well as detect unnecessary computation in the pipelines.
Descrição: Tese de mestrado, Engenharia Informática (Engenharia de Software), Universidade de Lisboa, Faculdade de Ciências, 2021
URI: http://hdl.handle.net/10451/51973
Designação: Tese de mestrado em Engenharia Informática (Engenharia de Software)
Aparece nas colecções:FC-DI - Master Thesis (dissertation)

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
TM_João_David.pdf1,95 MBAdobe PDFVer/Abrir


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.