Econometria Vs. Machine Learning: Big Data em Finanças

Baldé, Amadú

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10451/47669

Título:	Econometria Vs. Machine Learning: Big Data em Finanças
Autor:	Baldé, Amadú
Orientador:	Mendes, Diana E. Aldea
Palavras-chave:	Big Data Machine Learning Série temporal Índice Bolsista Standard & Poor’s 500 ARIMA/ARMA Previsão Teses de mestrado - 2020
Data de Defesa:	2020
Resumo:	A previsão dos preços dos índices bolsistas é uma das mais desafiadoras, complexas e fascinantes tarefas, uma vez que os conjuntos de dados onde estes se inserem, chamadas séries temporais, apresentam várias irregularidades (ruído, não-estacionariedade, não linearidades, entre outras). Vários têm sidos os estudos feitos ao longo dos anos com vista a encontrar técnicas mais eficazes, que sejam capazes de contornar essas irregularidades. Com o crescimento exponencial dos dados e a não homogeneidade dos mesmos, torna-se cada vez mais difícil a verificação dos pressupostos nos modelos econométricos. Tendo em conta os presentes desafios, a presente dissertação terá como principal objetivo comparar os métodos clássicos de econometria com os novos métodos de machine learning, para tal ir-se-á recorrer aos dados do índice bolsista S&P 500, no qual pretende-se prever no final os preços de fecho da série. Numa primeira fase, com vista a uma melhor compreensão das temáticas que serão abordadas faz-se uma contextualização sustentada na literatura científica e num conjunto de conceitos considerados essenciais para a compreensão dos temas abordados. Numa segunda fase, prossegue-se com o estudo empírico, onde ir-se-á analisar as estatísticas descritivas, os gráficos, os pressupostos dos modelos e depois escolhidos os potenciais modelos. Este capítulo será divido em dois subcapítulos. No primeiro subcapítulo o estudo será feito sob a alçada do programa estatístico Eviews onde serão abordadas as técnicas clássicas da econometria. No segundo subcapítulo o estudo será feito no software Python, considerado atualmente um dos softwares mais populares no mundo científico, académico e empresarial. No Eviews, uma vez obtida a estacionariedade da série procede-se com a modelização através da metodologia de Box-Jenkins, mais especificamente o modelo Autorregressivo Integrado de Médias Móveis – ARIMA. Uma vez escolhido o modelo, procede-se com a previsão dos preços de fecho da série. Por outro lado, no Python, serão abordadas vertentes mais inovadoras, sendo uma delas a aplicação das feature engineering que resultarão em trinta e uma (31) novas variáveis. Ao contrário dos modelos clássicos, os modelos obtidos pelos algoritmos de machine learning não necessitam da verificação dos pressupostos habituais econométricos, uma vez que a máquina aprende de forma “autónoma” a contornar certas irregularidades. Os algoritmos utilizados serão o de Regressão Linear/Linear Regression (LR), Suport Vector Regression (SVR) e Random Forest (RF). Por fim, é feita uma interpretação critica dos resultados obtidos ao longo de todo o estudo e comparam-se os resultados, atingindo assim o objetivo inicialmente delineado para a dissertação. Forecasting the prices of stock market indexes is one of the most challenging, complex and fascinating tasks, since the data sets where they are inserted, called time series, exhibit various irregularities (noise, non-stationarity, non-linearity, among others). Several studies have been carried out over the years with a view to finding more effective techniques that are capable to work around these irregularities. With the exponential growth of the data and the heterogeneity, it becomes more and more difficult to verify the assumptions in the econometric models. Taking into account the present challenges, this dissertation will have as main objective to compare the classic econometrics methods with the new machine learning algorithms, and for this we will use the data of the S&P 500 stock index, from which it is intended to predict at the end the closing prices of the series. In a first phase, with a view to a better understanding of the themes that will be approached, a contextualization based on scientific literature and on a set of concepts considered essential for the comprehension of the topics covered is made. In a second phase, we proceed with the empirical study, where we will analyze the descriptive statistics, the graphs, the assumptions of the models and then the potential models will be chosen. This chapter will be divided into two sub-chapters. In the first sub-chapter, the study will be carried out under the statistical program Eviews, where the classical econometrics techniques will be approached. In the second sub-chapter the study will be done in Python software, currently considered one of the most popular software in the scientific, academic and business world. In Eviews, once the time series is stationary, it is proceeded with the modeling through the Box-Jenkins methodology, more specifically the Integrated Autoregressive Moving Average model - ARIMA. After establishing the final model, the closing prices for the S&P 500 series are forecasted. On the other hand, in Python, more innovative aspects will be addressed, one of which is the application of feature engineering that will result in thirty-one (31) new variables. Unlike the classic models, the algorithms obtained from machine learning do not need to check the usual econometric assumptions, since the machine learns “autonomously” to work around certain irregularities. The algorithms used in this dissertation are the following: Linear Regression (LR), Support Vector Regression (SVR) and Random Forest (RF). Finally, a critical interpretation of the obtained results it is made and the results are compared, thus reaching the objective initially outlined for the dissertation.
Descrição:	Tese de mestrado, Matemática Financeira, Universidade de Lisboa, Faculdade de Ciências, 2020
URI:	http://hdl.handle.net/10451/47669
Designação:	Mestrado em Matemática Financeira
Aparece nas colecções:	FC - Dissertações de Mestrado

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
ulfc126220_tm_Amadú_Baldé.pdf		1,77 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo Dê a sua opinião sobre este registo.