Data Bubbles / simMetrics

Data Bubbles / simMetrics

1 devlog
1h 56m
Created by Arvind

Implementation of "Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering" in Rust (Paper @ https://www.dbs.ifi.lmu.de/Publikationen/Papers/DataBubbles-cameraReady.pdf), and several molecular similarity metrics.

Timeline

Ship 1

0 payouts of shell 0 shells

Arvind

about 1 month ago

Arvind Covers 1 devlog and 1h 56m

I worked on a significant amount of this project before summer of making, but I only have 2 hours from summer.

Today, I ran test clustering on 1000 molecules from the PubChem database and from that found several bugs that I fixed. Namely:
- Wrong bit similarity calculations for distance
- Overflow errors and array initialization errors in computing data bubbles.
I fixed these today, and will attempt to cluster larger amounts of molecules next.
Clusters are saved under tests/clusters.json, code in tests/genClusters.rs.
I'm also planning on figuring out ways of visualizing these high-dimensional clusters.

Update attachment