为什么不显示在" ORDER BY DESC"在卡桑德拉回答?

时间:2018-02-05 12:05:45

标签: cassandra

大家好! 我在Cassandra中创建了键空间:


    CREATE KEYSPACE monitoring WITH replication = {
         'class': 'SimpleStrategy',
         'replication_factor': '1'
    };


    CREATE TABLE monitoring.data (
        number text,
        day timestamp,
        last_day timestamp static,
        ids text static,
        PRIMARY KEY (number, day)
    ) WITH CLUSTERING ORDER BY (day DESC);

稍后我插入数据:


    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-26', '2017-05-26', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-10-26', '2017-10-26', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-01', '2017-05-01', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456AA901', '2017-05-01', '2017-05-01', '["A","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456BB901', '2017-05-01', '2017-05-01', '["B","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456CC901', '2017-05-01', '2017-05-01', '["C","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456DD901', '2017-05-01', '2017-05-01', '["D","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-23', '2017-05-23', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2018-05-26', '2018-05-26', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('23456789012', '2017-04-01', '2017-04-01', '["6","2","11"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('34567890123', '2017-03-28', '2017-03-28', '["1","5","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('45678901234', '2017-04-03', '2017-04-03', '["12","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('56789012345', '2018-01-26', '2018-01-26', '["3","2","1"]');

接下来我进行查询:


    select distinct number,last_day,ids from monitoring.data WHERE number in ('12345678901','56789012345','45678901234');

为什么Cassandra的答案是和号码45678901234在12345678901和56789012345之间?


         number        | last_day                 | ids
    -------------+--------------------------+----------------
     12345678901 | 2018-05-25 21:00:00+0000 |  ["1","2","3"]
     45678901234 | 2017-04-02 21:00:00+0000 | ["12","2","3"]
     56789012345 | 2018-01-25 21:00:00+0000 |  ["3","2","1"]

如何得到正确答案?复制因素在这种情况下重要吗?稍后我将使用LIMIT 10 ......

1 个答案:

答案 0 :(得分:2)

简单地说,number是您的分区密钥,您只能在集群密钥级别强制执行排序顺序。在分区键上使用非等号子句进行过滤(如IN)时,不能依赖结果的顺序。如果要删除IN子句,则可以通过哈希分区键获取订单中返回的行。如果我对您的查询进行调整以使用token()上的number函数,则结果的顺序更有意义:

aploetz@cqlsh:stackoverflow> select distinct number,token(number),last_day,ids 
    FROM data;

number      | system.token(number) | last_day                        | ids
-------------+----------------------+---------------------------------+----------------
 123456BB901 | -7512323826965212800 | 2017-05-01 05:00:00.000000+0000 |  ["B","2","3"]
 123456DD901 | -5242683095224762575 | 2017-05-01 05:00:00.000000+0000 |  ["D","2","3"]
 23456789012 | -2843835925329100734 | 2017-04-01 05:00:00.000000+0000 | ["6","2","11"]
 123456CC901 |   970122905143661162 | 2017-05-01 05:00:00.000000+0000 |  ["C","2","3"]
 45678901234 |  2207499658550692669 | 2017-04-03 05:00:00.000000+0000 | ["12","2","3"]
 12345678901 |  3063849707784841171 | 2018-05-26 05:00:00.000000+0000 |  ["1","2","3"]
 123456AA901 |  4307148681570630627 | 2017-05-01 05:00:00.000000+0000 |  ["A","2","3"]
 56789012345 |  5304329977670805052 | 2018-01-26 06:00:00.000000+0000 |  ["3","2","1"]
 34567890123 |  6079361129233417517 | 2017-03-28 05:00:00.000000+0000 |  ["1","5","3"]

(9 rows)

最重要的是,如果分区键上没有等于条件,则无法强制执行排序顺序。

不幸的是,你想要做的事情并不是Cassandra的优势。为了得到你想要的答案,你必须在你期望的结果中寻找相似之处,然后设计一个查询表来支持它。但是,如果您只讨论10行,那么在应用程序端对结果进行排序可能会更容易。