Scalable hierarchical clustering: Twister tries with a posteriori trie elimination

Michael Cochez, Ferrante Neri

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but still possible, hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic algorithm to overcome this sensitivity and show how approximate computations of dendrogram quality can help to evaluate the heuristic within reasonable time. The proposed metaheuristic is based on an evolutionary framework and integrates a surrogate model of the fitness within it to enhance the algorithmic performance in terms of computational time.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages756-763
Number of pages8
ISBN (Electronic)9781479975600
DOIs
Publication statusPublished - 1 Jan 2015
Externally publishedYes
EventIEEE Symposium Series on Computational Intelligence, SSCI 2015 - Cape Town, South Africa
Duration: 8 Dec 201510 Dec 2015

Publication series

NameProceedings - 2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015

Conference

ConferenceIEEE Symposium Series on Computational Intelligence, SSCI 2015
CountrySouth Africa
CityCape Town
Period8/12/1510/12/15

Cite this