Question

我有一个cassandra服务器，由另一个服务查询，我需要减少查询量。

我的第一个想法是每隔几分钟创建一个整个数据库的bloom过滤器并将其发送到服务。但是由于我在数据库中有几百GB（预计会增长到几TB），因此每隔几分钟就不会超载数据库。

经过一段时间寻找更好的解决方案后，我记得cassandra保留了自己的布隆过滤器。

是否可以复制* -Filter.db文件并在我的代码中使用它们而不是创建我自己的布隆过滤器？

Answer 1

我已经创建了一个表格测试

CREATE TABLE test (
   a int PRIMARY KEY,
   b int
);

插入1行

INSERT INTO test(a,b) VALUES(1, 10);

将数据刷新到磁盘后。我们可以使用*-Filter.db文件。就我而言，它是la-2-big-Filter.db 以下是检查分区键是否存在的示例代码

Murmur3Partitioner partitioner = new Murmur3Partitioner();

try (DataInputStream in = new DataInputStream(new FileInputStream(new File("la-2-big-Filter.db"))); IFilter filter = FilterFactory.deserialize(in, true)) {
    for (int i = 1; i <= 10; i++) {
        DecoratedKey decoratedKey = partitioner.decorateKey(Int32Type.instance.decompose(i));
        if (filter.isPresent(decoratedKey)) {
            System.out.println(i + " is present ");
        } else {
            System.out.println(i + " is not present ");
        }
    }
}

输出：

1 is present 
2 is not present 
3 is not present 
4 is not present 
5 is not present 
6 is not present 
7 is not present 
8 is not present 
9 is not present 
10 is not present

提取cassandra的布隆过滤器

1 个答案: