Scalable overlapping community detection

Ismail Elhelw, Rutger Hofman, Wenzhe Li, Sungjin Ahn, Max Welling, Henri Bal

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Recent advancements in machine learning algorithms have transformed the data analytics domain and provided innovative solutions to inherently difficult problems. However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs. For example, a social network can be modeled as a graph where the vertices and edges represent individuals and their relationships. As opposed to the problem of graph partitioning or clustering, an individual can be part of multiple communities which significantly increases the problem complexity. In this paper, we present and evaluate an efficient parallel and distributed implementation of a Stochastic Gradient Markov Chain Monte Carlo algorithm that solves the overlapping community detection problem. We show that the algorithm can scale and process graphs consisting of billions of edges and tens of millions of vertices on a compute cluster of 65 nodes. To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
PublisherACM, IEEE Computer Society
Pages1463-1472
Number of pages10
Volume2016-August
ISBN (Electronic)9781509021406
DOIs
Publication statusPublished - 2 Aug 2016
Event30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 - Chicago, United States
Duration: 23 May 201627 May 2016

Conference

Conference30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
Country/TerritoryUnited States
CityChicago
Period23/05/1627/05/16

Keywords

  • Distributed computing
  • High performance computing
  • Machine learning
  • Parallel programming
  • Performance analysis

Fingerprint

Dive into the research topics of 'Scalable overlapping community detection'. Together they form a unique fingerprint.

Cite this