Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering

Julian Rossbroich, Jeffrey Durieux, Tom F. Wilderjans*

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

In various scientific fields, researchers make use of partitioning methods (e.g., K-means) to disclose the structural mechanisms underlying object by variable data. In some instances, however, a grouping of objects into clusters that are allowed to overlap (i.e., assigning objects to multiple clusters) might lead to a better representation of the underlying clustering structure. To obtain an overlapping object clustering from object by variable data, Mirkin’s ADditive PROfile CLUStering (ADPROCLUS) model may be used. A major challenge when performing ADPROCLUS is to determine the optimal number of overlapping clusters underlying the data, which pertains to a model selection problem. Up to now, however, this problem has not been systematically investigated and almost no guidelines can be found in the literature regarding appropriate model selection strategies for ADPROCLUS. Therefore, in this paper, several existing model selection strategies for K-means (a.o., CHull, the Caliński-Harabasz, Krzanowski-Lai, Average Silhouette Width and Dunn Index and information-theoretic measures like AIC and BIC) and two cross-validation based strategies are tailored towards an ADPROCLUS context and are compared to each other in an extensive simulation study. The results demonstrate that CHull outperforms all other model selection strategies and this especially when the negative log-likelihood, which is associated with a minimal stochastic extension of ADPROCLUS, is used as (mis)fit measure. The analysis of a post hoc AIC-based model selection strategy revealed that better performance may be obtained when a different—more appropriate—definition of model complexity for ADPROCLUS is used.

Original languageEnglish
Pages (from-to)264-301
Number of pages38
JournalJournal of Classification
Volume39
Issue number2
Early online date17 Jan 2022
DOIs
Publication statusPublished - Jul 2022

Bibliographical note

Publisher Copyright:
© 2022, The Author(s).

Keywords

  • Additive profile clustering
  • ADPROCLUS
  • AIC
  • BIC
  • Choosing the number of clusters
  • CHull
  • Model selection
  • Overlapping clustering

Fingerprint

Dive into the research topics of 'Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering'. Together they form a unique fingerprint.

Cite this