A problem so hard even Google relies on Random Chance
Breaking Taps Breaking Taps
401K subscribers
1,177,669 views
0

 Published On Jun 28, 2023

Head to https://brilliant.org/BreakingTaps/ to get a 30-day free trial. The first 200 people will get 20% off their annual subscription.

Watch this video ad free on Nebula: https://nebula.tv/videos/breakingtaps...

-------------------------------------------------------

Today we're looking at HyperLogLog, an algorithm that leverages random chance to count the number of distinct items are in a dataset. It does this by tracking the longest run of zeros in a binary sequence, and uses that as an estimate of cardinality.

HLL is a probabilistic algorithm, meaning it's a guess rather than true answer. But due to some clever tricks it is usually within 2% of the correct value, and can do it both quickly and in a memory-efficient manner. A 512kb datastructure can accurately process trillions of items and terrabytes of data, which is pretty impressive!

When I made this video, I didn't realize that another #SoME3 was in progress. But a bunch of viewers suggested I enter the video, so I guess this is will be part of the event!

----------------------

🔬Patreon if that's your jam:   / breakingtaps  

📢Twitter:   / breakingtaps  
📷Instagram:   / breakingtaps  
💻Discord:   / discord  

----------------------

Journal papers:

Flajolet, Philippe, et al. "Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm." _Discrete Mathematics and Theoretical Computer Science_. Discrete Mathematics and Theoretical Computer Science, 2007. https://algo.inria.fr/flajolet/Public...

Heule, Stefan, Marc Nunkesser, and Alexander Hall. "Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm." _Proceedings of the 16th International Conference on Extending Database Technology_. 2013. https://static.googleusercontent.com/...

Earlier work:

Durand, Marianne, and Philippe Flajolet. "Loglog counting of large cardinalities." _Algorithms-ESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003. Proceedings 11_. Springer Berlin Heidelberg, 2003.

Flajolet, Philippe, and G. Nigel Martin. "Probabilistic counting algorithms for data base applications." Journal of computer and system sciences 31.2 (1985): 182-209.

Articles:

- https://towardsdatascience.com/hyperl...
- https://engineering.fb.com/2018/12/13...

show more

Share/Embed