我正在建立一个数据库来存储对象的状态。这表明例如颜色随时间而变化。
我想查询在给定时间内具有特定状态的所有对象,例如所有在给定的一天下午1点到下午2点之间至少有一次绿色的颜色。
我的想法就是这样一张桌子:
CREATE TABLE states (
type text,
value text,
name text,
timestamp timeuuid,
primary key ((type, value), timestamp, name)
) WITH CLUSTERING ORDER BY (timestamp DESC);
鉴于一些测试数据:
// A, becomes green, turns red and back to green
insert into states(type, value, name, timestamp) values ('color', 'red', 'A', minTimeuuid('2016-07-07T12:00:00+0000'));
insert into states(type, value, name, timestamp) values ('color', 'green', 'A', minTimeuuid('2016-07-07T13:35:00+0000'));
insert into states(type, value, name, timestamp) values ('color', 'red', 'A', minTimeuuid('2016-07-07T13:42:00+0000'));
insert into states(type, value, name, timestamp) values ('color', 'green', 'A', minTimeuuid('2016-07-07T13:45:00+0000'));
// B stays red
insert into states(type, value, name, timestamp) values ('color', 'red', 'B', minTimeuuid('2016-07-07T01:00:00+0000'));
// C stays green
insert into states(type, value, name, timestamp) values ('color', 'green', 'C', minTimeuuid('2016-07-07T11:27:00+0000'));
// D becomes red
insert into states(type, value, name, timestamp) values ('color', 'green', 'D', minTimeuuid('2016-07-07T13:00:00+0000'));
insert into states(type, value, name, timestamp) values ('color', 'red', 'D', minTimeuuid('2016-07-07T13:27:00+0000'));
type | value | system.dateof(timestamp) | name
-------+-------+--------------------------+------
color | green | 2016-07-07 13:45:00+0000 | A
color | green | 2016-07-07 13:35:00+0000 | A
color | green | 2016-07-07 13:00:00+0000 | D
color | green | 2016-07-07 11:27:00+0000 | C
color | red | 2016-07-07 13:42:00+0000 | A
color | red | 2016-07-07 13:27:00+0000 | D
color | red | 2016-07-07 12:00:00+0000 | A
color | red | 2016-07-07 01:00:00+0000 | B
我想得到的是A,C,D而不是B,因为它在时间范围内不是绿色。
查询之间的简单:
select name from states where type = 'color' and value = 'green' and timestamp >= minTimeuuid('2016-07-07T13:00:00+0000') and timestamp < minTimeuuid('2016-07-07T14:00:00+0000');
结果我得到A,A,D
。我不能在这里使用distinct,因为"SELECT DISTINCT queries must only request partition key columns and/or static columns (not name)"
但我可以使用重复项,因为它们很容易在应用程序端处理。
此查询的主要问题是它无法检测到C,因为颜色在时间范围之前已经是绿色且未在内部更改。
更新
我可以随意修改数据库,但I cannot specify when the connected devices send updates. They just send data as their state changes and the middleware has to be stateless
时间范围是用户在查询时定义的,我不能(并且不想)将其设置为固定范围。
是否有一个众所周知的模式?
答案 0 :(得分:4)
我认为可以使用接受文字值的用户定义函数和用户定义聚合来实现您想要的(这是由JIRA CASSANDRA-10783完成的)。让我解释一下它是如何实现的:
is_in_interval()
的accumulatorFunction(参见下面的代码示例)matching_objects_in_interval()
(请参阅下面的代码示例)实现示例(如果没有来自CASSANDRA-10783的补丁,则无法编译)
CREATE FUNCTION is_in_interval(state set<text>, name text, timestamp timeuuid,
min_date timeuuid, max_date timeuuid)
RETURNS NULL ON NULL INPUT
RETURNS set<text>
LANGUAGE java
AS $$
// The object has it timestamp inside the provided date range
if(timestamp.compareTo(min_date) >= 0 &&
timestamp.compareTo(max_date) <= 0) {
// We don't care adding multiple time because Set eliminates duplicates anyway
state.add(name);
}
return state;
$$;
CREATE AGGREGATE IF NOT EXISTS matching_objects_in_interval(text, timeuuid, timeuuid, timeuuid)
SFUNC is_in_interval
STYPE set<text>
// {} is the Cassandra LITERAL SYNTAX for empty set
INITCOND {};
用法:
SELECT matching_objects_in_interval(name, timestamp, minTimeuuid('2016-07-07T13:00:00+0000'), minTimeuuid('2016-07-07T14:00:00+0000'))
FROM states
WHERE type = 'color'
AND value = 'green';