Question

群集密钥上的in运算符是否比多个相等查询更有效？

select * from table where primaryKey1 = 1 and primaryKey2 = 2 and clusterKey in (1,2)

VS。 2个查询。

select * from table where primaryKey1 = 1 and primaryKey2 = 2 and clusterKey = 1 

select * from table where primaryKey1 = 1 and primaryKey2 = 2 and clusterKey = 2

Answer 1

这取决于您要在 IN 运算符中放置多少值。在效率方面，两者都需要大约相同的时间来返回resultSet。但是，如果您的值会更大，那么使用多个查询而不是使用 IN 运算符是一种很好的做法。

这是由于当您使用带有 IN 关键字的单个查询时Cassandra的分布式节点结构，请求/响应将仅由一个节点处理。此外，如果出现问题，Cassandra将从头开始重新运行整个查询，并且不会存储检索到的结果。

您可以在Java或python中使用Async查询，它将作为批处理运行多个查询。我将允许请求/响应通过多个节点，如果单个查询失败，它可以再次仅针对该特定查询重试。

例如：

CREATE TABLE IF NOT EXISTS users (id uuid PRIMARY KEY, name text);

SELECT * FROM users WHERE id IN (
    e6af74a8-4711-4609-a94f-2cbfab9695e5,
    281336f4-2a52-4535-847c-11a4d3682ec1,
    c32b8d37-89bd-4dfe-a7d5-5f0258692d05
);

这不一定是最佳的：此查询将发送到协调器节点，然后协调器节点必须查询每个分区键的副本。考虑到我们有一个智能token-aware驱动程序，为每个分区键（SELECT * FROM users WHERE id =？）发送单个查询会更有效，它会直接到达正确的副本。然后剩下的就是整理客户端的结果。

Future<List<ResultSet>> future = ResultSets.queryAllAsList(session, 
"SELECT * FROM users WHERE id = ?",UUID.fromString("e6af74a8-4711-
4609-a94f-2cbfab9695e5"), UUID.fromString("281336f4-2a52-4535-847c-
11a4d3682ec1"));
for (ResultSet rs : future.get()) {
    process the result set    
}

如需更多说明，请阅读以下链接： https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

https://www.datastax.com/dev/blog/java-driver-async-queries

Cassandra集群在查询中的关键与多个查询效率

1 个答案: