Question

nodetool cfstats / tablestats显示“压缩分区最大字节数”

现在如何找到这个分区或其他大分区的密钥？

目的是分析为什么这些分区变大并相应地纠正数据模型。

我已经看到可以在日志中看到这些分区键，但不幸的是我的日志会被定期删除。

Answer 1

您可以查看nodetool toppartitions命令，该命令应该显示最活跃的分区。有时，它有助于分析和管理您的数据。

Answer 2

也许您可以使用Apache drill或presto-db等外部工具来运行如下查询：

SELECT key1, key2, COUNT(*) AS total
FROM yourTable
GROUP BY key1, key2
ORDER BY total DESC
LIMIT 10;

key1和key2是分区键的一部分。

此查询将按大小获得前10个分区。

希望这可以帮到你。

Answer 3

您可以使用instaclustr工具

https://www.instaclustr.com/support/documentation/tools/ic-tools-for-cassandra-sstables/

以下命令对于查找大分区非常有用：

ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

-n <num>    Number of partitions to display in leaders lists
-t <name>   Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files>  Comma separated list of Data.db sstables to filter on

另一个有用的工具是sstable-tools：

https://github.com/tolbertam/sstable-tools

它有一个describe命令，显示最宽和最大的分区

java -jar sstable-tools.jar describe ma-2-big-Data.db

输出如下：

/Users/clohfink/git/sstable-tools/./src/test/resources/ma-2-big-Data.db
=======================================================================
Partitions: 1
Rows: 1
Tombstones: 0
Cells: 4
Widest Partitions:
   [frodo] 1
Largest Partitions:
   [frodo] 104 (104 B)
Tombstone Leaders:
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Size: 50 (50 B)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
  Compression ratio: -1.0
Minimum timestamp: 1455937221199050 (02/19/2016 21:00:21)
Maximum timestamp: 1455937221199050 (02/19/2016 21:00:21)
SSTable min local deletion time: 2147483647 (01/18/2038 21:14:07)
SSTable max local deletion time: 2147483647 (01/18/2038 21:14:07)
TTL min: 0 (0 milliseconds)

在cassandra中搜索大分区的关键

3 个答案: