Master Thesis FZJ-2015-06512

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Design and Evaluation of an SVM Framework for Scientific Data Applications



2015

ix, 58 p. () = Maastricht University, Masterarbeit, 2015

Please use a persistent id in citations:

Abstract: Support vector machines (SVMs) are a popular classification method due totheir good accuracy and broad usage domains in scientific applications. Thecomputational complexity is between O(n2) and O(n3) for the number of n trainingsamples. The scalability for larger data sets is therefore a problem of SVMs. Withthe increasing number of large data problems, this disadvantage becomes moreand more significant. In order to overcome these scalability issues, this thesisdesigns and implements a parallel and scalable framework that realizes the cascadeSVM approach including specific improvements. A fundamental speed up andincreased scalability is gained by splitting up the data set into several sub setsthat can be worked on in parallel. The framework is designed to run in modernHigh Performance Computing (HPC) environments, that provide the necessarymassively parallel resources (e.g. large clusters with good node interconnects) tosolve large data problems. The framework however also works on a simple computerfor smaller problems if needed. To keep the interface usable for non-technical savvydomain scientists, Python is used.The standard cascade SVM approach is improved with a standardized file formatand parallel I/O is introduced that both improve the I/O performance, whichbesides computing is also often observed to be a bottleneck for large problems. Inorder to enable enhanced training speed up as well as a better accuracy furtherimprovements such as distance filters and cross-feedback options are realized andevaluated. The resulting improved cascade SVM approach and parallel and scalableframework design is then evaluated on a real world remote sensing data set andcompared to another parallel implementation called pi-SVM. The parallelizationstrategies of these two implementations are different whereby the cascade SVM is adata processing approach, pi-SVM follows primarily an algorithmic-driven approach.


Note: Maastricht University, Masterarbeit, 2015

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 512 - Data-Intensive Science and Federated Computing (POF3-512) (POF3-512)

Appears in the scientific report 2015
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Theses > Master Theses
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2015-11-12, last modified 2021-01-29