Question

在进入system.log之前，如何在cassandra集群上找到大分区？因此，我们面临一些性能问题。谁能帮我。我们有cassandra版本2.0.11和2.1.16。

Answer 1

您可以查看nodetool tablestats（或在较旧版本的Cassandra中为nodetool cfstats）的输出-对于每个表，它都有行压缩分区最大字节以及其他行信息，例如在本示例中，当最大分区大小约为268Mb时：

    Table: table_name
    SSTable count: 2
    Space used (live): 147638509
    Space used (total): 147638509
    .....
    Compacted partition minimum bytes: 43
    Compacted partition maximum bytes: 268650950
    Compacted partition mean bytes: 430941
    Average live cells per slice (last five minutes): 8256.0
    Maximum live cells per slice (last five minutes): 10239
    Average tombstones per slice (last five minutes): 1.0
    Maximum tombstones per slice (last five minutes): 1
    .....

但是nodetool tablestats仅向您提供有关当前节点的信息，因此您需要在集群的每个节点上执行该信息。

更新：您可以使用其他工具找到最大的分区：

https://github.com/tolbertam/sstable-tools具有describe命令，该命令显示最大/最宽的分区。此命令will be also available in Cassandra 4.0。
对于DataStax产品，DSBulk工具supports counting of partitions。

Answer 2

尝试nodetool tablehistograms -- <keyspace> <table>命令提供有关表的统计信息，包括读/写延迟，分区大小，列数和SSTable的数量。

下面是示例输出：

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             0.00             73.46              0.00         223875792             61214
75%             0.00             88.15              0.00         668489532            182785
95%             0.00            152.32              0.00        1996099046            654949
98%             0.00            785.94              0.00        3449259151           1358102
99%             0.00            943.13              0.00        3449259151           1358102
Min             0.00             24.60              0.00              5723                 4
Max             0.00           5839.59              0.00        5960319812           1955666

这提供了表的正确统计信息，例如raw_data表的95％百分位数具有107MB的分区大小，最大为3.44GB。

希望这有助于找出性能问题。

如何在Cassandra中找到除了system.log之外的大分区？

2 个答案: