felix.util
Class BloomFilter<E>

java.lang.Object
  extended by felix.util.BloomFilter<E>
Type Parameters:
E - Object type that is to be inserted into the Bloom filter, e.g. String or Integer.
All Implemented Interfaces:
java.io.Serializable

public class BloomFilter<E>
extends java.lang.Object
implements java.io.Serializable

Implementation of a Bloom-filter, as described here: http://en.wikipedia.org/wiki/Bloom_filter Inspired by the SimpleBloomFilter-class written by Ian Clarke. This implementation provides a more evenly distributed Hash-function by using a proper digest instead of the Java RNG. Many of the changes were proposed in comments in his blog: http://blog.locut.us/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

Author:
Magnus Skjegstad
See Also:
Serialized Form

Constructor Summary
BloomFilter(double falsePositiveProbability, int expectedNumberOfElements)
          Constructs an empty Bloom filter with a given false positive probability.
BloomFilter(double c, int n, int k)
          Constructs an empty Bloom filter.
BloomFilter(int bitSetSize, int expectedNumberOElements)
          Constructs an empty Bloom filter.
BloomFilter(int bitSetSize, int expectedNumberOfFilterElements, int actualNumberOfFilterElements, java.util.BitSet filterData)
          Construct a new Bloom filter based on existing Bloom filter data.
 
Method Summary
 void add(E element)
          Adds an object to the Bloom filter.
 void addAll(java.util.Collection<? extends E> c)
          Adds all elements from a Collection to the Bloom filter.
 void clear()
          Sets all bits to false in the Bloom filter.
 boolean contains(E element)
          Returns true if the element could have been inserted into the Bloom filter.
 boolean containsAll(java.util.Collection<? extends E> c)
          Returns true if all the elements of a Collection could have been inserted into the Bloom filter.
 int count()
          Returns the number of elements added to the Bloom filter after it was constructed or after clear() was called.
static long createHash(byte[] data)
          Generates a digest based on the contents of an array of bytes.
static long createHash(java.lang.String val)
          Generates a digest based on the contents of a String.
static long createHash(java.lang.String val, java.nio.charset.Charset charset)
          Generates a digest based on the contents of a String.
 boolean equals(java.lang.Object obj)
          Compares the contents of two instances to see if they are equal.
 double expectedFalsePositiveProbability()
          Calculates the expected probability of false positives based on the number of expected filter elements and the size of the Bloom filter.
 boolean getBit(int bit)
          Read a single bit from the Bloom filter.
 java.util.BitSet getBitSet()
          Return the bit set used to store the Bloom filter.
 double getBitsPerElement()
          Get actual number of bits per element based on the number of elements that have currently been inserted and the length of the Bloom filter.
 double getExpectedBitsPerElement()
          Get expected number of bits per element when the Bloom filter is full.
 int getExpectedNumberOfElements()
          Returns the expected number of elements to be inserted into the filter.
 double getFalsePositiveProbability()
          Get the current probability of a false positive.
 double getFalsePositiveProbability(double numberOfElements)
          Calculate the probability of a false positive given the specified number of inserted elements.
 int getK()
          Returns the value chosen for K.

K is the optimal number of hash functions based on the size of the Bloom filter and the expected number of inserted elements.
 int hashCode()
          Calculates a hash code for this class.
 void setBit(int bit, boolean value)
          Set a single bit in the Bloom filter.
 int size()
          Returns the number of bits in the Bloom filter.
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BloomFilter

public BloomFilter(double c,
                   int n,
                   int k)
Constructs an empty Bloom filter. The total length of the Bloom filter will be c*n.

Parameters:
c - is the number of bits used per element.
n - is the expected number of elements the filter will contain.
k - is the number of hash functions used.

BloomFilter

public BloomFilter(int bitSetSize,
                   int expectedNumberOElements)
Constructs an empty Bloom filter. The optimal number of hash functions (k) is estimated from the total size of the Bloom and the number of expected elements.

Parameters:
bitSetSize - defines how many bits should be used in total for the filter.
expectedNumberOElements - defines the maximum number of elements the filter is expected to contain.

BloomFilter

public BloomFilter(double falsePositiveProbability,
                   int expectedNumberOfElements)
Constructs an empty Bloom filter with a given false positive probability. The number of bits per element and the number of hash functions is estimated to match the false positive probability.

Parameters:
falsePositiveProbability - is the desired false positive probability.
expectedNumberOfElements - is the expected number of elements in the Bloom filter.

BloomFilter

public BloomFilter(int bitSetSize,
                   int expectedNumberOfFilterElements,
                   int actualNumberOfFilterElements,
                   java.util.BitSet filterData)
Construct a new Bloom filter based on existing Bloom filter data.

Parameters:
bitSetSize - defines how many bits should be used for the filter.
expectedNumberOfFilterElements - defines the maximum number of elements the filter is expected to contain.
actualNumberOfFilterElements - specifies how many elements have been inserted into the filterData BitSet.
filterData - a BitSet representing an existing Bloom filter.
Method Detail

createHash

public static long createHash(java.lang.String val,
                              java.nio.charset.Charset charset)
Generates a digest based on the contents of a String.

Parameters:
val - specifies the input data.
charset - specifies the encoding of the input data.
Returns:
digest as long.

createHash

public static long createHash(java.lang.String val)
Generates a digest based on the contents of a String.

Parameters:
val - specifies the input data. The encoding is expected to be UTF-8.
Returns:
digest as long.

createHash

public static long createHash(byte[] data)
Generates a digest based on the contents of an array of bytes.

Parameters:
data - specifies input data.
Returns:
digest as long.

equals

public boolean equals(java.lang.Object obj)
Compares the contents of two instances to see if they are equal.

Overrides:
equals in class java.lang.Object
Parameters:
obj - is the object to compare to.
Returns:
True if the contents of the objects are equal.

hashCode

public int hashCode()
Calculates a hash code for this class.

Overrides:
hashCode in class java.lang.Object
Returns:
hash code representing the contents of an instance of this class.

expectedFalsePositiveProbability

public double expectedFalsePositiveProbability()
Calculates the expected probability of false positives based on the number of expected filter elements and the size of the Bloom filter.

The value returned by this method is the expected rate of false positives, assuming the number of inserted elements equals the number of expected elements. If the number of elements in the Bloom filter is less than the expected value, the true probability of false positives will be lower.

Returns:
expected probability of false positives.

getFalsePositiveProbability

public double getFalsePositiveProbability(double numberOfElements)
Calculate the probability of a false positive given the specified number of inserted elements.

Parameters:
numberOfElements - number of inserted elements.
Returns:
probability of a false positive.

getFalsePositiveProbability

public double getFalsePositiveProbability()
Get the current probability of a false positive. The probability is calculated from the size of the Bloom filter and the current number of elements added to it.

Returns:
probability of false positives.

getK

public int getK()
Returns the value chosen for K.

K is the optimal number of hash functions based on the size of the Bloom filter and the expected number of inserted elements.

Returns:
optimal k.

clear

public void clear()
Sets all bits to false in the Bloom filter.


add

public void add(E element)
Adds an object to the Bloom filter. The output from the object's toString() method is used as input to the hash functions.

Parameters:
element - is an element to register in the Bloom filter.

addAll

public void addAll(java.util.Collection<? extends E> c)
Adds all elements from a Collection to the Bloom filter.

Parameters:
c - Collection of elements.

contains

public boolean contains(E element)
Returns true if the element could have been inserted into the Bloom filter. Use getFalsePositiveProbability() to calculate the probability of this being correct.

Parameters:
element - element to check.
Returns:
true if the element could have been inserted into the Bloom filter.

containsAll

public boolean containsAll(java.util.Collection<? extends E> c)
Returns true if all the elements of a Collection could have been inserted into the Bloom filter. Use getFalsePositiveProbability() to calculate the probability of this being correct.

Parameters:
c - elements to check.
Returns:
true if all the elements in c could have been inserted into the Bloom filter.

getBit

public boolean getBit(int bit)
Read a single bit from the Bloom filter.

Parameters:
bit - the bit to read.
Returns:
true if the bit is set, false if it is not.

setBit

public void setBit(int bit,
                   boolean value)
Set a single bit in the Bloom filter.

Parameters:
bit - is the bit to set.
value - If true, the bit is set. If false, the bit is cleared.

getBitSet

public java.util.BitSet getBitSet()
Return the bit set used to store the Bloom filter.

Returns:
bit set representing the Bloom filter.

size

public int size()
Returns the number of bits in the Bloom filter. Use count() to retrieve the number of inserted elements.

Returns:
the size of the bitset used by the Bloom filter.

count

public int count()
Returns the number of elements added to the Bloom filter after it was constructed or after clear() was called.

Returns:
number of elements added to the Bloom filter.

getExpectedNumberOfElements

public int getExpectedNumberOfElements()
Returns the expected number of elements to be inserted into the filter. This value is the same value as the one passed to the constructor.

Returns:
expected number of elements.

getExpectedBitsPerElement

public double getExpectedBitsPerElement()
Get expected number of bits per element when the Bloom filter is full. This value is set by the constructor when the Bloom filter is created. See also getBitsPerElement().

Returns:
expected number of bits per element.

getBitsPerElement

public double getBitsPerElement()
Get actual number of bits per element based on the number of elements that have currently been inserted and the length of the Bloom filter. See also getExpectedBitsPerElement().

Returns:
number of bits per element.