TY - GEN
T1 - Design patterns for efficient graph algorithms in MapReduce
AU - Lin, Jimmy
AU - Schatz, Michael
PY - 2010
Y1 - 2010
N2 - Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of proteinprotein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their large size, often spanning millions of vertices and billions of edges. As such, researchers have increasingly turned to distributed solutions. In particular, MapReduce has emerged as an enabling technology for large-scale graph processing. However, existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. In this paper, we present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%.
AB - Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of proteinprotein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their large size, often spanning millions of vertices and billions of edges. As such, researchers have increasingly turned to distributed solutions. In particular, MapReduce has emerged as an enabling technology for large-scale graph processing. However, existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. In this paper, we present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%.
UR - http://www.scopus.com/inward/record.url?scp=77956237139&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77956237139&partnerID=8YFLogxK
U2 - 10.1145/1830252.1830263
DO - 10.1145/1830252.1830263
M3 - Conference contribution
AN - SCOPUS:77956237139
SN - 9781450302142
T3 - Proceedings of the 8th Workshop on Mining and Learning with Graphs, MLG'10
SP - 78
EP - 85
BT - Proceedings of the 8th Workshop on Mining and Learning with Graphs, MLG'10
T2 - 8th Workshop on Mining and Learning with Graphs, MLG'10
Y2 - 24 July 2010 through 25 July 2010
ER -