我已经从运行‘Number of key(estimate)
看到了对nodetool cfstats
的引用,但是至少在我的系统(Cassandra版本3.11.3)中,我没有看到它:
Table: XXXXXX
SSTable count: 4
Space used (live): 2393755943
Space used (total): 2393755943
Space used by snapshots (total): 0
Off heap memory used (total): 2529880
SSTable Compression Ratio: 0.11501749368144083
Number of partitions (estimate): 1146
Memtable cell count: 296777
Memtable data size: 147223380
Memtable off heap memory used: 0
Memtable switch count: 127
Local read count: 9
Local read latency: NaN ms
Local write count: 44951572
Local write latency: 0.043 ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 2144
Bloom filter off heap memory used: 2112
Index summary off heap memory used: 240
Compression metadata off heap memory used: 2527528
Compacted partition minimum bytes: 447
Compacted partition maximum bytes: 43388628
Compacted partition mean bytes: 13547448
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
使用此版本的Cassandra,可以通过某种方式近似select count(*) from XXXXXX
吗?
答案 0 :(得分:1)
“键数”与“分区数”相同-也是一个估计值。如果您的分区键是主键(没有聚类列),那么您将对该节点上的行数有一个估计。否则,就是分区键值数量的估计。
-吉姆
答案 1 :(得分:1)
这已用CASSANDRA-13722进行了更改。无论如何,“键数”估计始终表示“分区数”,这很明显。
要估算大表中的行数,可以将该值(分区数)作为起点。然后,对聚类键组合(行)的数量进行平均,您应该能够对此进行有根据的猜测。
另一种想法是计算出一行的大小(以字节为单位)。然后查看nodetool tablehistograms keyspacename.tablename
输出的P50:
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 2.00 35.43 4866.32 124 1
将“分区大小”的P50(第50个百分位数)除以一行的大小。这应该为您提供该表返回的平均行数。然后,将其乘以“分区数”,就应该有该节点的数目。
如何在Cassandra中获得一行的大小?
$ bin/cqlsh 127.0.0.1 -u aaron -p yourPasswordSucks -e "SELECT * FROM system.local WHERE key='local';" > local.txt
$ ls -al local.txt
-rw-r--r-- 1 z001mj8 DHC\Domain Users 2321 Sep 16 15:08 local.txt
很显然,您需要删除诸如管道定界符和行标题之类的内容(更不用说考虑字符串与数字之间的大小差异了),但是文件的最终字节大小应该使您陷入困境