Divide &amp; conquer: I/O efficient depth-first search

Zhang, Z; Yu, JX; Qin, L; Shang, Z

Divide & conquer: I/O efficient depth-first search

Zhang, Z Yu, JX

Qin, L

Shang, Z

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2015, 2015-May pp. 445 - 458
Issue Date:: 2015-05-27

Closed Access

	Filename	Description	Size
	O Efficient Depth-First Search.pdf	Published version	629.76 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Z	en_US
dc.contributor.author	Yu, JX https://orcid.org/0000-0002-9738-827X	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Shang, Z	en_US
dc.date.issued	2015-05-27	en_US
dc.identifier.citation	Proceedings of the ACM SIGMOD International Conference on Management of Data, 2015, 2015-May pp. 445 - 458	en_US
dc.identifier.isbn	9781450327589	en_US
dc.identifier.issn	0730-8078	en_US
dc.identifier.uri	http://hdl.handle.net/10453/41388
dc.description.abstract	Copyright © 2015 ACM. Depth-First Search (DFS), which traverses a graph in the depthfirst order, is one of the fundamental graph operations, and the result of DFS over all nodes in G is a spanning tree known as a DFS-Tree. There are many graph algorithms that need DFS such as connected component computation, topological sort, community detection, eulerian path computation, graph bipartiteness testing, planar graph testing, etc, because the in-memory DFS algorithm shows it can be done in linear time w.r.t. the size of G. However, given the fact that real-world graphs grow rapidly in the big data era, the in-memory DFS algorithm cannot be used to handle a large graph that cannot be entirely held in main memory. In this paper, we focus on I/O efficiency and study semi-external algorithms to DFS a graph G which is on disk. Here, like the existing semiexternal algorithms, we assume that a spanning tree of G can be held in main memory and the remaining edges of G are kept on disk, and compute the DFS-Tree in main memory with which DFS can be identified. We propose novel divide & conquer algorithms to DFS over a graph G on disk. In brief, we divide a graph into several subgraphs, compute the DFS-Tree for each subgraph independently, and then merge them together to compute the DFS-Tree for the whole graph. With the global DFS-Tree computed we identify DFS. We discuss the valid division, that can lead to the correct DFS, and the challenges to do so. We propose two division algorithms, named Divide-Star and Divide-TD, and a merge algorithm. We conduct extensive experimental studies using four real massive datasets and several synthetic datasets to confirm the I/O efficiency of our approach.	en_US
dc.relation.ispartof	Proceedings of the ACM SIGMOD International Conference on Management of Data	en_US
dc.relation.isbasedon	10.1145/2723372.2723740	en_US
dc.title	Divide & conquer: I/O efficient depth-first search	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2015-May	en_US
utslib.for	080101 Adaptive Agents and Intelligent Robotics	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	2015-May	en_US

Abstract:

Copyright © 2015 ACM. Depth-First Search (DFS), which traverses a graph in the depthfirst order, is one of the fundamental graph operations, and the result of DFS over all nodes in G is a spanning tree known as a DFS-Tree. There are many graph algorithms that need DFS such as connected component computation, topological sort, community detection, eulerian path computation, graph bipartiteness testing, planar graph testing, etc, because the in-memory DFS algorithm shows it can be done in linear time w.r.t. the size of G. However, given the fact that real-world graphs grow rapidly in the big data era, the in-memory DFS algorithm cannot be used to handle a large graph that cannot be entirely held in main memory. In this paper, we focus on I/O efficiency and study semi-external algorithms to DFS a graph G which is on disk. Here, like the existing semiexternal algorithms, we assume that a spanning tree of G can be held in main memory and the remaining edges of G are kept on disk, and compute the DFS-Tree in main memory with which DFS can be identified. We propose novel divide & conquer algorithms to DFS over a graph G on disk. In brief, we divide a graph into several subgraphs, compute the DFS-Tree for each subgraph independently, and then merge them together to compute the DFS-Tree for the whole graph. With the global DFS-Tree computed we identify DFS. We discuss the valid division, that can lead to the correct DFS, and the challenges to do so. We propose two division algorithms, named Divide-Star and Divide-TD, and a merge algorithm. We conduct extensive experimental studies using four real massive datasets and several synthetic datasets to confirm the I/O efficiency of our approach.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/41388

Divide &amp; conquer: I/O efficient depth-first search

Divide & conquer: I/O efficient depth-first search