Abstract
This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k - 1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present d2k, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of d2k on real graphs with up to half a billion edges.
Original language | English |
---|---|
Title of host publication | KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
Publisher | Association for Computing Machinery |
Pages | 1272-1281 |
ISBN (Print) | 9781450355520 |
DOIs | |
Publication status | Published - 19 Jul 2018 |
Externally published | Yes |
Event | 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018 - London, United Kingdom Duration: 19 Aug 2018 → 23 Aug 2018 |
Conference
Conference | 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018 |
---|---|
Country/Territory | United Kingdom |
City | London |
Period | 19/08/18 → 23/08/18 |