Bloom Filter
Bloom Filter are memory efficient data strucutre that return "Firm NO" if data doesn't exist or a "may-be yes" if it may exist.
To create a bloom filter,
- create a bit array
- loop over the data
- hash the data to get an index
- set the bit to 1 at that index
Now if we want to check if a certain data exists or not hash the data to get a index and check the bit at that index. This may return a false positive, but never a false negative.
See also:
- How bloom filters made SQLite 10x faster: news.ycombinator, avi.im