Optimized joins & filtering with Bloom filter predicate in Kudu
Cloudera
JANUARY 15, 2021
A Bloom filter is a space-efficient probabilistic data structure used to test set membership with a possibility of false-positive matches. Step 3 is the heaviest since it involves reading the entire big table and could involve heavy network IO if the worker and the nodes hosting the big table are not on the same server. Bloom filter.
Let's personalize your content