Question

我有一张这样的桌子。

> CREATE TABLE docyard.documents (
>     document_id text,
>     namespace text,
>     version_id text,
>     created_at timestamp,
>     path text,
>     attributes map<text, text>
>     PRIMARY KEY (document_id, namespace, version_id, created_at) ) WITH CLUSTERING ORDER BY (namespace ASC, version_id ASC, created_at
> ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';

我希望能够在以下条件下进行范围查询 -

select * from documents where namespace = 'something' and created_at> 'some-value' order by created_at allow filtering;

select from documents where namespace = 'something' and path = 'something' and created_at> 'some-value' order by created_at allow filtering;

我无法以任何方式使这些查询有效。尝试了二级索引。有人可以帮忙吗？

在尝试使其发挥作用时，我会继续得到一些或另一种。

Answer 1

首先，不要使用二级索引或ALLOW FILTERING。时间序列数据会随着时间的推移而显着。

为了满足您的第一个查询，您需要重新构建您的PRIMARY KEY和CLUSTERING ORDER，如下所示：

PRIMARY KEY (namespace, created_at, document_id) ) 
WITH CLUSTERING ORDER BY (created_at DESC, document_id ASC);

这将允许以下内容：

按namespace分区。
按DESCending顺序排序created_at（最先读取的行最先读取）。
document_id
您的查询中不需要ALLOW FILTERING或ORDER BY，因为必须提供必要的密钥，结果已经按照您的CLUSTERING ORDER排序。

对于第二个查询，我将创建一个额外的查询表。这是因为在Cassandra中，您需要对表进行建模以适合您的查询。您最终可能会为同一数据提供多个查询表，这没关系。

CREATE TABLE docyardbypath.documents (
  document_id text,
  namespace text,
  version_id text,
  created_at timestamp,
  path text,
  attributes map<text, text>
PRIMARY KEY ((namespace, path), created_at, document_id) ) 
  WITH CLUSTERING ORDER BY (created_at DESC, document_id ASC);

这将：

namespace和path分区。
允许根据您的群集顺序对namespace和path的唯一组合中的行进行排序。
同样，您的查询中不需要ALLOW FILTERING或ORDER BY。

Answer 2

我认为您需要了解数据建模在Cassandra中的工作原理。

第一个查询可能如下所示：

select * from documents where namespace = 'something' and created_at > 'some_formatted_date'  and document_id='someid' and version_id='some_version' order by namespace, version_id, created_at allow filtering;

查询Cassandra表时，您必须：

在select
Order by遵循群集顺序

修复第二个查询非常简单。你想做什么？ Cassandra针对写入性能进行了优化。您可能希望将此数据写入计划运行的每组查询的多个表中。

卡桑德拉时间序列建模

2 个答案: