This paper presents ETSCH1 , a novel paradigm processing large graphs. E T S C H departs from the vertex- based approach of BSP frameworks like PREGEL in two ways: first, the units of computation are not the vertices, but rather a collection of subgraphs, obtained by partitioning the input graph; second, the subgraphs are obtained through an edge-partitioning algorithm, in which edges, rather than vertices, are subdivided into disjoint subsets Global computations over the graph are then easily expressed using classical centralized algorithms executed on each of the partitions, with the only additional burden of specifying simple reconciliation procedures when vertices are replicated in multiple computing nodes. The ETSCH paradigm has been implemented both on top of existing frameworks like HADOOP and SPARK, and as a stand-alone service based on AKKA, a toolkit for building distributed message-driven appli- cations. When considering problems like single-source shortest path and PageRank, our experiments show that solutions based on ETSCH/HADOOP and ETSCH/SPARK already outperform the standard solutions to the same problems in HADOOP and SPARK, respectively. But it is our AKKA implementation that really shines: the execution time on graphs with millions of edges falls down from from thousands of seconds (ETSCH/HADOOP) to tens of seconds (ETSCH/SPARK) to seconds (ETSCH/AKKA), while easily scaling to graphs with billions of edges. ETSCH/AKKA is also faster than other partition-centric frameworks like BLOGEL and GPS.

ETSCH: Partition-centric Graph Processing

Montresor, Alberto;Centellegher, Simone
2016-01-01

Abstract

This paper presents ETSCH1 , a novel paradigm processing large graphs. E T S C H departs from the vertex- based approach of BSP frameworks like PREGEL in two ways: first, the units of computation are not the vertices, but rather a collection of subgraphs, obtained by partitioning the input graph; second, the subgraphs are obtained through an edge-partitioning algorithm, in which edges, rather than vertices, are subdivided into disjoint subsets Global computations over the graph are then easily expressed using classical centralized algorithms executed on each of the partitions, with the only additional burden of specifying simple reconciliation procedures when vertices are replicated in multiple computing nodes. The ETSCH paradigm has been implemented both on top of existing frameworks like HADOOP and SPARK, and as a stand-alone service based on AKKA, a toolkit for building distributed message-driven appli- cations. When considering problems like single-source shortest path and PageRank, our experiments show that solutions based on ETSCH/HADOOP and ETSCH/SPARK already outperform the standard solutions to the same problems in HADOOP and SPARK, respectively. But it is our AKKA implementation that really shines: the execution time on graphs with millions of edges falls down from from thousands of seconds (ETSCH/HADOOP) to tens of seconds (ETSCH/SPARK) to seconds (ETSCH/AKKA), while easily scaling to graphs with billions of edges. ETSCH/AKKA is also faster than other partition-centric frameworks like BLOGEL and GPS.
2016
Proceedings. of the 2nd International Conference on Cloud and Big Data Computing
Stati Uniti
IEEE
Guerrieri, Alessio; Montresor, Alberto; Centellegher, Simone
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/164964
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 1
social impact