Fast Hierarchical Clustering on Structured Data

Betreuer/in:            Matthias Kaul           
Dekanat/Institut:   E-11, Institute for Algorithms and Complexity           

E-Mail:   algo@tuhh.de

Background

Clustering algorithms are an important tool in data analysis and machine learning. They allow us to categorize large data sets automatically, giving insight into underlying structures which are otherwise inaccessible. However, these Clustering problems are usually difficult to solve, so sophisticated algorithmic techniques are needed to find solutions. A recently proposed hierarchical clustering scheme promises good results if a tight approximation to Uniform Sparsest Cut problems can be found quickly [1]. While this is a similarly difficult task in general, at least on certain types of Data such approximations can be computed efficiently [2]. This implies, at least in theory, the existence of good clustering algorithms for such data.

Goals

In this thesis we aim to improve our understanding of the relationship between Clustering and Sparsest Cut problems as well as assess the viability of Sparsest-Cut-Based Hierarchical Clustering for practical applications. You should therefore have:

  • An interest in graph theory and graph algorithms
  • Experience programming in C++ or Python
  • A willingness to engage with mathematical aspects of Computer Science

 

Literature

[1] Sanjoy Dasgupta. 2016. A cost function for similarity-based hierarchical clustering. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing (STOC ’16). Association for Computing Machinery, New York, NY, USA, 118–127. DOI:https://doi.org/10.1145/2897518.2897527

[2] Bonsma et al., 2021. The complexity of finding uniform sparsest cuts in various graph classes. Journal of Discrete Algorithms, Volume 14. https://doi.org/10.1016/j.jda.2011.12.008.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.