Abstract
We consider the problem of encoding a set of vectors into a minimal number of bits while preserving information on their Euclidean geometry. We show that this task can be accomplished by applying a Johnson-Lindenstrauss embedding and subsequently binarizing each vector by comparing each entry of the vector to a uniformly random threshold. Using this simple construction we produce two encodings of a dataset such that one can query Euclidean information for a pair of points using a small number of bit operations up to a desired additive error - Euclidean distances in the first case and inner products and squared Euclidean distances in the second. In the latter case, each point is encoded in near-linear time. The number of bits required for these encodings is quantified in terms of two natural complexity parameters of the dataset - its covering numbers and localized Gaussian complexity - and shown to be near-optimal.
| Original language | English |
|---|---|
| Publisher | arXiv |
| Pages | 1-27 |
| Number of pages | 27 |
| DOIs | |
| Publication status | Published - 17 Sept 2021 |
Keywords
- cs.IT
- cs.DS
- math.IT
- math.MG
Fingerprint
Dive into the research topics of 'Binarized Johnson-Lindenstrauss embeddings'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver