Cassandra UTF8Type键的排序是什么? (卡桑德拉2.0)

时间:2014-02-13 23:14:13

标签: cassandra-2.0

Cassandra UTF8Type的排序是什么?

所有文档都让我期待一个词法排序顺序(基本上是字母顺序)。这似乎不是Cassandra使用的命令。 使用的内容对我来说很难猜测。

我构建了一个表来计算影响命名“应用程序”的交互,这些交互在一天的时间段内组织。 (这是一个简单的例子来证明我混淆的原因)。我希望能够寻找特定的应用程序 该表的CQL描述如下:

CREATE TABLE "appMetrics" (app text,time timestamp,counter_val counter,
    PRIMARY KEY (app, time)) WITH COMPACT STORAGE;

我用数据加载它:

update "appMetrics" set counter_val = counter_val+1 WHERE app='ab' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='a' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='c' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='b' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='bc' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='ca' AND time='2014-02-14 00:00:00';

我从表中选择并看到这个结果:

    select * from "appMetrics";

     app | time                     | counter_val
    -----+--------------------------+-------------
       a | 2014-02-14 00:00:00-0500 |           1
       c | 2014-02-14 00:00:00-0500 |           1
      ab | 2014-02-14 00:00:00-0500 |           1
      ca | 2014-02-14 00:00:00-0500 |           1
      bc | 2014-02-14 00:00:00-0500 |           1
       b | 2014-02-14 00:00:00-0500 |           1

    (6 rows)

所以,这个顺序不是字母顺序,不是输入顺序,不是我能看到的任何顺序。排序不是随机的,或者至少它是可重复的:

cqlsh:simplex> select * from "appMetrics" where token(app) >= token('ab');

 app | time                     | counter_val
-----+--------------------------+-------------
  ab | 2014-02-14 00:00:00-0500 |           1
  ca | 2014-02-14 00:00:00-0500 |           1
  bc | 2014-02-14 00:00:00-0500 |           1
   b | 2014-02-14 00:00:00-0500 |           1

(4 rows)

cqlsh:simplex> select * from "appMetrics" where token(app) <= token('ab');

 app | time                     | counter_val
-----+--------------------------+-------------
   a | 2014-02-14 00:00:00-0500 |           1
   c | 2014-02-14 00:00:00-0500 |           1
  ab | 2014-02-14 00:00:00-0500 |           1

(3 rows)

对于它的价值,列系列被描述为:

    ColumnFamily: appMetrics
      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType
      Cells sorted by: org.apache.cassandra.db.marshal.TimestampType
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.1
      DC Local Read repair chance: 0.0
      Populate IO Cache on flush: false
      Replicate on write: true
      Caching: KEYS_ONLY
      Default time to live: 0
      Bloom Filter FP chance: 0.01
      Index interval: 128
      Speculative Retry: 99.0PERCENTILE
      Built indexes: []
      Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
      Compression Options:
        sstable_compression: org.apache.cassandra.io.compress.LZ4Compressor

有人可以解释这些是如何订购的吗?

1 个答案:

答案 0 :(得分:0)

好的,我想我现在知道这个问题的答案了。因为密钥(分区密钥)是密钥的标记化表示,所以答案是行(分区)按令牌的顺序存储。

作为演示,对于上面显示的同一个表,我请求了密钥的令牌值,并得到了它。

cqlsh:simplex> select token(app), app from "appMetrics";

 token(app)           | app
----------------------+-----
 -8839064797231613815 |   a
 -8198557465434950441 |   c
 -7815133031266706642 |  ab
  -633243080167210587 |  ca
  4832945267908438539 |  bc
  8833996863197925870 |   b

(6 rows)

进一步信息:这是因为我使用了默认的Murmur3Partitioner。我可以通过使用ByteOrderPartitioner按字母顺序(我认为)获取内容。不幸的是,这是在集群级别设置的,因此它控制整个集群。 Datastax(http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html)不建议使用ByteOrderPartitioner。