Cassandra UTF8Type的排序是什么?
所有文档都让我期待一个词法排序顺序(基本上是字母顺序)。这似乎不是Cassandra使用的命令。 使用的内容对我来说很难猜测。
我构建了一个表来计算影响命名“应用程序”的交互,这些交互在一天的时间段内组织。 (这是一个简单的例子来证明我混淆的原因)。我希望能够寻找特定的应用程序 该表的CQL描述如下:
CREATE TABLE "appMetrics" (app text,time timestamp,counter_val counter, PRIMARY KEY (app, time)) WITH COMPACT STORAGE;
我用数据加载它:
update "appMetrics" set counter_val = counter_val+1 WHERE app='ab' AND time='2014-02-14 00:00:00'; update "appMetrics" set counter_val = counter_val+1 WHERE app='a' AND time='2014-02-14 00:00:00'; update "appMetrics" set counter_val = counter_val+1 WHERE app='c' AND time='2014-02-14 00:00:00'; update "appMetrics" set counter_val = counter_val+1 WHERE app='b' AND time='2014-02-14 00:00:00'; update "appMetrics" set counter_val = counter_val+1 WHERE app='bc' AND time='2014-02-14 00:00:00'; update "appMetrics" set counter_val = counter_val+1 WHERE app='ca' AND time='2014-02-14 00:00:00';
我从表中选择并看到这个结果:
select * from "appMetrics"; app | time | counter_val -----+--------------------------+------------- a | 2014-02-14 00:00:00-0500 | 1 c | 2014-02-14 00:00:00-0500 | 1 ab | 2014-02-14 00:00:00-0500 | 1 ca | 2014-02-14 00:00:00-0500 | 1 bc | 2014-02-14 00:00:00-0500 | 1 b | 2014-02-14 00:00:00-0500 | 1 (6 rows)
所以,这个顺序不是字母顺序,不是输入顺序,不是我能看到的任何顺序。排序不是随机的,或者至少它是可重复的:
cqlsh:simplex> select * from "appMetrics" where token(app) >= token('ab');
app | time | counter_val
-----+--------------------------+-------------
ab | 2014-02-14 00:00:00-0500 | 1
ca | 2014-02-14 00:00:00-0500 | 1
bc | 2014-02-14 00:00:00-0500 | 1
b | 2014-02-14 00:00:00-0500 | 1
(4 rows)
cqlsh:simplex> select * from "appMetrics" where token(app) <= token('ab');
app | time | counter_val
-----+--------------------------+-------------
a | 2014-02-14 00:00:00-0500 | 1
c | 2014-02-14 00:00:00-0500 | 1
ab | 2014-02-14 00:00:00-0500 | 1
(3 rows)
对于它的价值,列系列被描述为:
ColumnFamily: appMetrics Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType Cells sorted by: org.apache.cassandra.db.marshal.TimestampType GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 0.1 DC Local Read repair chance: 0.0 Populate IO Cache on flush: false Replicate on write: true Caching: KEYS_ONLY Default time to live: 0 Bloom Filter FP chance: 0.01 Index interval: 128 Speculative Retry: 99.0PERCENTILE Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Compression Options: sstable_compression: org.apache.cassandra.io.compress.LZ4Compressor
有人可以解释这些是如何订购的吗?
答案 0 :(得分:0)
好的,我想我现在知道这个问题的答案了。因为密钥(分区密钥)是密钥的标记化表示,所以答案是行(分区)按令牌的顺序存储。
作为演示,对于上面显示的同一个表,我请求了密钥的令牌值,并得到了它。
cqlsh:simplex> select token(app), app from "appMetrics"; token(app) | app ----------------------+----- -8839064797231613815 | a -8198557465434950441 | c -7815133031266706642 | ab -633243080167210587 | ca 4832945267908438539 | bc 8833996863197925870 | b (6 rows)
进一步信息:这是因为我使用了默认的Murmur3Partitioner。我可以通过使用ByteOrderPartitioner按字母顺序(我认为)获取内容。不幸的是,这是在集群级别设置的,因此它控制整个集群。 Datastax(http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html)不建议使用ByteOrderPartitioner。