Processing and clustering of ancient Chinese poems with the objective of finding similar sentences

Soler Arasanz, Gonzalo

Visualitza/Obre

89759.pdf (1,709Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Soler Arasanz, Gonzalo

Tutor / directorDai, Liu

Tipus de documentProjecte/Treball Final de Carrera

Data2013-02-02

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

The objective of this project is to create a program that processes a set of ancient Chinese poems, reading them from a text file and storing them into data structures, so that they can be used to find similar sentences to a text the user will introduce. In order to achieve this they are broken into sentences, which are clustered (always keeping track of which poem they belong to), using a tf-idf score system between them to establish their similarity. Similar sentences will be found checking the similarity between the words they contain to the provided text. The clusters are calculated with a modification of hierarchical clustering, following the same principles, but limiting clustering to four sentences maximum. This way, a small set of similar sentences can be provided to the user instead of just one sentence similar to the text he inputted. Four clusters will be provided, the ones to which the most similar sentences belong to

MatèriesTranslators (Computer programs), Chinese poetry--Translations, Traductors (Programes d'ordinador), Poesia xinesa -- Traducció

TitulacióENGINYERIA INFORMÀTICA (Pla 2003)

URIhttp://hdl.handle.net/2099.1/19079

Col·leccions

Facultat d'Informàtica de Barcelona - Enginyeria Informàtica (Pla 2003) [1.189]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
89759.pdf		1,709Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Processing and clustering of ancient Chinese poems with the objective of finding similar sentences

Visualitza/Obre

Explora