Abstract
Recent advancements in machine learning algorithms have transformed the data analytics domain and provided innovative solutions to inherently difficult problems. However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs. For example, a social network can be modeled as a graph where the vertices and edges represent individuals and their relationships. As opposed to the problem of graph partitioning or clustering, an individual can be part of multiple communities which significantly increases the problem complexity. In this paper, we present and evaluate an efficient parallel and distributed implementation of a Stochastic Gradient Markov Chain Monte Carlo algorithm that solves the overlapping community detection problem. We show that the algorithm can scale and process graphs consisting of billions of edges and tens of millions of vertices on a compute cluster of 65 nodes. To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale.
Original language | English |
---|---|
Title of host publication | Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 |
Publisher | ACM, IEEE Computer Society |
Pages | 1463-1472 |
Number of pages | 10 |
Volume | 2016-August |
ISBN (Electronic) | 9781509021406 |
DOIs | |
Publication status | Published - 2 Aug 2016 |
Event | 30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 - Chicago, United States Duration: 23 May 2016 → 27 May 2016 |
Conference
Conference | 30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 |
---|---|
Country/Territory | United States |
City | Chicago |
Period | 23/05/16 → 27/05/16 |
Keywords
- Distributed computing
- High performance computing
- Machine learning
- Parallel programming
- Performance analysis