大家好! 我在Cassandra中创建了键空间:
CREATE KEYSPACE monitoring WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' }; CREATE TABLE monitoring.data ( number text, day timestamp, last_day timestamp static, ids text static, PRIMARY KEY (number, day) ) WITH CLUSTERING ORDER BY (day DESC);
稍后我插入数据:
INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-26', '2017-05-26', '["1","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-10-26', '2017-10-26', '["1","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-01', '2017-05-01', '["1","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456AA901', '2017-05-01', '2017-05-01', '["A","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456BB901', '2017-05-01', '2017-05-01', '["B","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456CC901', '2017-05-01', '2017-05-01', '["C","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456DD901', '2017-05-01', '2017-05-01', '["D","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-23', '2017-05-23', '["1","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2018-05-26', '2018-05-26', '["1","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('23456789012', '2017-04-01', '2017-04-01', '["6","2","11"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('34567890123', '2017-03-28', '2017-03-28', '["1","5","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('45678901234', '2017-04-03', '2017-04-03', '["12","2","3"]'); INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('56789012345', '2018-01-26', '2018-01-26', '["3","2","1"]');
接下来我进行查询:
select distinct number,last_day,ids from monitoring.data WHERE number in ('12345678901','56789012345','45678901234');
为什么Cassandra的答案是和号码45678901234在12345678901和56789012345之间?
number | last_day | ids -------------+--------------------------+---------------- 12345678901 | 2018-05-25 21:00:00+0000 | ["1","2","3"] 45678901234 | 2017-04-02 21:00:00+0000 | ["12","2","3"] 56789012345 | 2018-01-25 21:00:00+0000 | ["3","2","1"]
如何得到正确答案?复制因素在这种情况下重要吗?稍后我将使用LIMIT 10 ......
答案 0 :(得分:2)
简单地说,number
是您的分区密钥,您只能在集群密钥级别强制执行排序顺序。在分区键上使用非等号子句进行过滤(如IN
)时,不能依赖结果的顺序。如果要删除IN
子句,则可以通过哈希分区键获取订单中返回的行。如果我对您的查询进行调整以使用token()
上的number
函数,则结果的顺序更有意义:
aploetz@cqlsh:stackoverflow> select distinct number,token(number),last_day,ids
FROM data;
number | system.token(number) | last_day | ids
-------------+----------------------+---------------------------------+----------------
123456BB901 | -7512323826965212800 | 2017-05-01 05:00:00.000000+0000 | ["B","2","3"]
123456DD901 | -5242683095224762575 | 2017-05-01 05:00:00.000000+0000 | ["D","2","3"]
23456789012 | -2843835925329100734 | 2017-04-01 05:00:00.000000+0000 | ["6","2","11"]
123456CC901 | 970122905143661162 | 2017-05-01 05:00:00.000000+0000 | ["C","2","3"]
45678901234 | 2207499658550692669 | 2017-04-03 05:00:00.000000+0000 | ["12","2","3"]
12345678901 | 3063849707784841171 | 2018-05-26 05:00:00.000000+0000 | ["1","2","3"]
123456AA901 | 4307148681570630627 | 2017-05-01 05:00:00.000000+0000 | ["A","2","3"]
56789012345 | 5304329977670805052 | 2018-01-26 06:00:00.000000+0000 | ["3","2","1"]
34567890123 | 6079361129233417517 | 2017-03-28 05:00:00.000000+0000 | ["1","5","3"]
(9 rows)
最重要的是,如果分区键上没有等于条件,则无法强制执行排序顺序。
不幸的是,你想要做的事情并不是Cassandra的优势。为了得到你想要的答案,你必须在你期望的结果中寻找相似之处,然后设计一个查询表来支持它。但是,如果您只讨论10行,那么在应用程序端对结果进行排序可能会更容易。