Question

Cassandra UTF8Type的排序是什么？

所有文档都让我期待一个词法排序顺序（基本上是字母顺序）。这似乎不是Cassandra使用的命令。使用的内容对我来说很难猜测。

我构建了一个表来计算影响命名“应用程序”的交互，这些交互在一天的时间段内组织。（这是一个简单的例子来证明我混淆的原因）。我希望能够寻找特定的应用程序该表的CQL描述如下：

CREATE TABLE "appMetrics" (app text,time timestamp,counter_val counter,
    PRIMARY KEY (app, time)) WITH COMPACT STORAGE;

我用数据加载它：

update "appMetrics" set counter_val = counter_val+1 WHERE app='ab' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='a' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='c' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='b' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='bc' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='ca' AND time='2014-02-14 00:00:00';

我从表中选择并看到这个结果：

    select * from "appMetrics";

     app | time                     | counter_val
    -----+--------------------------+-------------
       a | 2014-02-14 00:00:00-0500 |           1
       c | 2014-02-14 00:00:00-0500 |           1
      ab | 2014-02-14 00:00:00-0500 |           1
      ca | 2014-02-14 00:00:00-0500 |           1
      bc | 2014-02-14 00:00:00-0500 |           1
       b | 2014-02-14 00:00:00-0500 |           1

    (6 rows)

所以，这个顺序不是字母顺序，不是输入顺序，不是我能看到的任何顺序。排序不是随机的，或者至少它是可重复的：

cqlsh:simplex> select * from "appMetrics" where token(app) >= token('ab');

 app | time                     | counter_val
-----+--------------------------+-------------
  ab | 2014-02-14 00:00:00-0500 |           1
  ca | 2014-02-14 00:00:00-0500 |           1
  bc | 2014-02-14 00:00:00-0500 |           1
   b | 2014-02-14 00:00:00-0500 |           1

(4 rows)

cqlsh:simplex> select * from "appMetrics" where token(app) <= token('ab');

 app | time                     | counter_val
-----+--------------------------+-------------
   a | 2014-02-14 00:00:00-0500 |           1
   c | 2014-02-14 00:00:00-0500 |           1
  ab | 2014-02-14 00:00:00-0500 |           1

(3 rows)

对于它的价值，列系列被描述为：

    ColumnFamily: appMetrics
      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType
      Cells sorted by: org.apache.cassandra.db.marshal.TimestampType
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.1
      DC Local Read repair chance: 0.0
      Populate IO Cache on flush: false
      Replicate on write: true
      Caching: KEYS_ONLY
      Default time to live: 0
      Bloom Filter FP chance: 0.01
      Index interval: 128
      Speculative Retry: 99.0PERCENTILE
      Built indexes: []
      Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
      Compression Options:
        sstable_compression: org.apache.cassandra.io.compress.LZ4Compressor

有人可以解释这些是如何订购的吗？

Answer 1

好的，我想我现在知道这个问题的答案了。因为密钥（分区密钥）是密钥的标记化表示，所以答案是行（分区）按令牌的顺序存储。

作为演示，对于上面显示的同一个表，我请求了密钥的令牌值，并得到了它。

cqlsh:simplex> select token(app), app from "appMetrics";

 token(app)           | app
----------------------+-----
 -8839064797231613815 |   a
 -8198557465434950441 |   c
 -7815133031266706642 |  ab
  -633243080167210587 |  ca
  4832945267908438539 |  bc
  8833996863197925870 |   b

(6 rows)

进一步信息：这是因为我使用了默认的Murmur3Partitioner。我可以通过使用ByteOrderPartitioner按字母顺序（我认为）获取内容。不幸的是，这是在集群级别设置的，因此它控制整个集群。 Datastax（http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html）不建议使用ByteOrderPartitioner。

Cassandra UTF8Type键的排序是什么？（卡桑德拉2.0）

1 个答案:

Cassandra UTF8Type键的排序是什么？ （卡桑德拉2.0）

1 个答案:

Cassandra UTF8Type键的排序是什么？（卡桑德拉2.0）