Question

我目前有一张类似

的表格

CREATE TABLE locations (
   id bigint,
   data_source text,
   timestamp timestamp,
   latitude double,
   longitude double,
   PRIMARY KEY ((id, data_source), timestamp)
) WITH CLUSTERING ORDER BY (timestamp ASC)

我正在尝试获取该实体的最后一个位置，因此我的查询看起来像这样

SELECT FROM locations WHERE id = {} AND data_source = {} ORDER BY timestamp DESC LIMIT 1

直观地说，我希望这个查询能够等同于缺少ORDER BY子句，但我不确定这是否正确。我可以认为这是正确的吗？

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/refClstrOrdr.html的文件似乎另有说法。 You can order query results to make use of the on-disk sorting of columns. You can order results in ascending or descending order. The ascending order will be more efficient than descending. If you need results in descending order, you can specify a clustering order to store columns on disk in the reverse order of the default. Descending queries will then be faster than ascending ones.

预计该表将在未来几个月内迅速增长。我是否需要创建一个新表，其中反转聚类order by子句以防止将来出现任何性能问题？

由于

Answer 1

首先，我假设您将值放在id和data_source中，因为您无需指定分区键即可进行排序。

现在，如果存在多个时间戳，则您的查询可以提供与没有Order By或不同的结果相同的结果。因为看看你的表创建，时间戳的默认顺序是ASC，所以如果你没有顺序触发相同的查询，你将获得具有最低时间戳的行。通过订单，您将获得具有最高时间戳的行。

Answer 2

当查询ORDER BY与表格CLUSTERING ORDER BY匹配时，您可以获得最佳效果。

因此，如果您的查询模式是访问最高时间戳，那么您肯定需要将数据存储在CLUSTERING ORDER BY (timestamp DESC)的表中。

当查询排序顺序与磁盘顺序不匹配时，数据将以较低效的方式从磁盘中获取并在内存中排序，并且您的查询将会慢得多（这就是为什么这被认为是反模式）。

cassandra通过聚类键查询订单

2 个答案: