我在Cassandra有一张表,有超过20亿条记录(最大值),不分区很好。假设:
> create table if not exists bar (
bar_id bigint,
col_a text,
col_b text,
col_c int,
col_d text,
primary key (bar_id, col_a, col_b, col_c))
我已使用nodetool refreshsizeestimates
更新了大小估算值,我可以看到数据很好,但所有行都在同一个令牌下:
> select col_a, col_b, col_c from foo.bar where bar_id = 1;
col_a | col_b | col_c
--------+-------+--------
aaaaaa | b | 0
aaaaaa | bb | 1
aaaaaa | bb | 2
aaaaaa | bbb | 1
> select col_a, col_b, col_c
from foo.bar
where bar_id = 1
and token(crawl_id) < 6121040252107678107
and token(crawl_id) > 6121040252107678107;
col_a | col_b | col_c
--------+-------+--------
(0 rows)
然而,分区大小全部为零:
> select * from system.size_estimates
where keyspace_name = 'foo' and table_name = 'bar';
keyspace_name | table_name | range_start | range_end | mean_partition_size | partitions_count
---------------+-------------+---------------------+---------------------+---------------------+------------------
foo | bar | 6038186684182191258 | 6054588198651336225 | 0 | 0
foo | bar | 6892562529594743760 | 6908565156302797782 | 0 | 0
foo | bar | 6944218160728667924 | 6944876930711291150 | 0 | 0
...
Keyspace复制设置为1。
> create keyspace foo
with replication = {'class': 'NetworkTopologyStrategy', 'south': '1'} and durable_writes = true;
如何在整个或每个分区中获取正确的大小或有关令牌/数据的任何信息?
$ nodetool tablehistograms -- foo bar
foo/bar histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 20.00 0.00 155469.30 8582860529 129557750
75% 20.00 0.00 186563.16 158683580810 268650950
95% 20.00 0.00 186563.16 568591960032 1996099046
98% 20.00 0.00 186563.16 568591960032 1996099046
99% 20.00 0.00 186563.16 568591960032 1996099046
Min 18.00 0.00 129557.75 268650951 0
Max 20.00 0.00 186563.16 568591960032 1996099046
Cassandra版本3.11.2
。