Implementation of "Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering" in Rust (Paper @ https://www.dbs.ifi.lmu.de/Publikationen/Papers/DataBubbles-cameraReady.pdf), and several molecular similarity metrics.
No followers yet
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
I worked on a significant amount of this project before summer of making, but I only have 2 hours from summer.
Today, I ran test clustering on 1000 molecules from the PubChem database and from that found several bugs that I fixed. Namely:
- Wrong bit similarity calculations for distance
- Overflow errors and array initialization errors in computing data bubbles.
I fixed these today, and will attempt to cluster larger amounts of molecules next.
Clusters are saved under tests/clusters.json, code in tests/genClusters.rs.
I'm also planning on figuring out ways of visualizing these high-dimensional clusters.