Scalable overlapping community detection

Ismail Elhelw, Rutger Hofman, Wenzhe Li, Sungjin Ahn, Max Welling, Henri Bal

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Recent advancements in machine learning algorithms have transformed the data analytics domain and provided innovative solutions to inherently difficult problems. However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs. For example, a social network can be modeled as a graph where the vertices and edges represent individuals and their relationships. As opposed to the problem of graph partitioning or clustering, an individual can be part of multiple communities which significantly increases the problem complexity. In this paper, we present and evaluate an efficient parallel and distributed implementation of a Stochastic Gradient Markov Chain Monte Carlo algorithm that solves the overlapping community detection problem. We show that the algorithm can scale and process graphs consisting of billions of edges and tens of millions of vertices on a compute cluster of 65 nodes. To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
PublisherACM, IEEE Computer Society
Pages1463-1472
Number of pages10
Volume2016-August
ISBN (Electronic)9781509021406
DOIs
Publication statusPublished - 2 Aug 2016
Event30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 - Chicago, United States
Duration: 23 May 201627 May 2016

Conference

Conference30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
CountryUnited States
CityChicago
Period23/05/1627/05/16

Fingerprint

Markov processes
Learning algorithms
Learning systems

Keywords

  • Distributed computing
  • High performance computing
  • Machine learning
  • Parallel programming
  • Performance analysis

Cite this

Elhelw, I., Hofman, R., Li, W., Ahn, S., Welling, M., & Bal, H. (2016). Scalable overlapping community detection. In Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 (Vol. 2016-August, pp. 1463-1472). [7530037] ACM, IEEE Computer Society. https://doi.org/10.1109/IPDPSW.2016.165
Elhelw, Ismail ; Hofman, Rutger ; Li, Wenzhe ; Ahn, Sungjin ; Welling, Max ; Bal, Henri. / Scalable overlapping community detection. Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016. Vol. 2016-August ACM, IEEE Computer Society, 2016. pp. 1463-1472
@inproceedings{9b648c610107412294958d6f502d1c1c,
title = "Scalable overlapping community detection",
abstract = "Recent advancements in machine learning algorithms have transformed the data analytics domain and provided innovative solutions to inherently difficult problems. However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs. For example, a social network can be modeled as a graph where the vertices and edges represent individuals and their relationships. As opposed to the problem of graph partitioning or clustering, an individual can be part of multiple communities which significantly increases the problem complexity. In this paper, we present and evaluate an efficient parallel and distributed implementation of a Stochastic Gradient Markov Chain Monte Carlo algorithm that solves the overlapping community detection problem. We show that the algorithm can scale and process graphs consisting of billions of edges and tens of millions of vertices on a compute cluster of 65 nodes. To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale.",
keywords = "Distributed computing, High performance computing, Machine learning, Parallel programming, Performance analysis",
author = "Ismail Elhelw and Rutger Hofman and Wenzhe Li and Sungjin Ahn and Max Welling and Henri Bal",
year = "2016",
month = "8",
day = "2",
doi = "10.1109/IPDPSW.2016.165",
language = "English",
volume = "2016-August",
pages = "1463--1472",
booktitle = "Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016",
publisher = "ACM, IEEE Computer Society",

}

Elhelw, I, Hofman, R, Li, W, Ahn, S, Welling, M & Bal, H 2016, Scalable overlapping community detection. in Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016. vol. 2016-August, 7530037, ACM, IEEE Computer Society, pp. 1463-1472, 30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016, Chicago, United States, 23/05/16. https://doi.org/10.1109/IPDPSW.2016.165

Scalable overlapping community detection. / Elhelw, Ismail; Hofman, Rutger; Li, Wenzhe; Ahn, Sungjin; Welling, Max; Bal, Henri.

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016. Vol. 2016-August ACM, IEEE Computer Society, 2016. p. 1463-1472 7530037.

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Scalable overlapping community detection

AU - Elhelw, Ismail

AU - Hofman, Rutger

AU - Li, Wenzhe

AU - Ahn, Sungjin

AU - Welling, Max

AU - Bal, Henri

PY - 2016/8/2

Y1 - 2016/8/2

N2 - Recent advancements in machine learning algorithms have transformed the data analytics domain and provided innovative solutions to inherently difficult problems. However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs. For example, a social network can be modeled as a graph where the vertices and edges represent individuals and their relationships. As opposed to the problem of graph partitioning or clustering, an individual can be part of multiple communities which significantly increases the problem complexity. In this paper, we present and evaluate an efficient parallel and distributed implementation of a Stochastic Gradient Markov Chain Monte Carlo algorithm that solves the overlapping community detection problem. We show that the algorithm can scale and process graphs consisting of billions of edges and tens of millions of vertices on a compute cluster of 65 nodes. To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale.

AB - Recent advancements in machine learning algorithms have transformed the data analytics domain and provided innovative solutions to inherently difficult problems. However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs. For example, a social network can be modeled as a graph where the vertices and edges represent individuals and their relationships. As opposed to the problem of graph partitioning or clustering, an individual can be part of multiple communities which significantly increases the problem complexity. In this paper, we present and evaluate an efficient parallel and distributed implementation of a Stochastic Gradient Markov Chain Monte Carlo algorithm that solves the overlapping community detection problem. We show that the algorithm can scale and process graphs consisting of billions of edges and tens of millions of vertices on a compute cluster of 65 nodes. To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale.

KW - Distributed computing

KW - High performance computing

KW - Machine learning

KW - Parallel programming

KW - Performance analysis

UR - http://www.scopus.com/inward/record.url?scp=84991688473&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991688473&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2016.165

DO - 10.1109/IPDPSW.2016.165

M3 - Conference contribution

VL - 2016-August

SP - 1463

EP - 1472

BT - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016

PB - ACM, IEEE Computer Society

ER -

Elhelw I, Hofman R, Li W, Ahn S, Welling M, Bal H. Scalable overlapping community detection. In Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016. Vol. 2016-August. ACM, IEEE Computer Society. 2016. p. 1463-1472. 7530037 https://doi.org/10.1109/IPDPSW.2016.165