Cassandra在单个节点上读取重载

时间:2015-11-02 11:04:29

标签: cassandra

几周以来,我们在cassandra 2.1.8集群上有一个不可理解的重载节点。 ReadStage线程池通常已满,并且此节点上有许多已删除的任务,只有这一个 该节点接收全局arround两倍于其他节点的负载 它是一个9节点集群,RF是3。

Snitch是一种网络拓扑,但拓扑文件是默认的。因此,节点都被认为是在同一个机架和同一个数据中心。

当我们选择托管在此节点上的单个数据时,我们尝试从cqlsh(使用consistency one)多次选择它,来自重载节点的数据需要90%(见tracing on })。即使数据也托管在我使用cqlsh连接的节点上,也会发生这种情况 我们试图在拓扑中的特定机架中隔离节点,但它没有改变任何东西。

该节点已被删除(停用)并再次添加。 本周结束时添加了另一个节点(群集中以前只有8个节点) 但是重载节点没有任何变化 新节点的协调器也更喜欢重载节点。

问题仅出在读取请求上 突变没有任何问题。

什么可以解释这个?
协调员是否还有其他信息来选择它将要求的节点?
我们可以做些什么来解决这个问题?

感谢您的任何建议。

编辑:

几个表格引发了这个问题。但是,有关信息,请参阅我们进行测试的表格的说明:

CREATE TABLE main.app (
  id uuid PRIMARY KEY,
  acc uuid,
  apdex_f int,
  apdex_t int,
  cnx uuid,
  iid bigint,
  name text,
  prod boolean,
  sla_active boolean,
  st ascii,
  techno text,
  tz ascii
) WITH bloom_filter_fp_chance = 0.01
  AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
  AND comment = ''
  AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
  AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
  AND dclocal_read_repair_chance = 0.1
  AND default_time_to_live = 0
  AND gc_grace_seconds = 864000
  AND max_index_interval = 2048
  AND memtable_flush_period_in_ms = 0
  AND min_index_interval = 128
  AND read_repair_chance = 0.0
  AND speculative_retry = '99.0PERCENTILE';

这是经过测试的查询:

SELECT name from app where id = 630bbc5e-387a-424a-a8a3-e4948ec7470a;

这是痕迹(我刚刚执行了10次,结果与此差不多)

 activity                                                                                            | timestamp                  | source     | source_elapsed
-----------------------------------------------------------------------------------------------------+----------------------------+------------+----------------
                                                                                  Execute CQL3 query | 2015-11-02 19:03:19.866000 | 172.16.0.1 |              0
 Parsing SELECT name from app where id = 630bbc5e-387a-424a-a8a3-e4948ec7470a; [SharedPool-Worker-7] | 2015-11-02 19:03:19.867000 | 172.16.0.1 |             36
                                                           Preparing statement [SharedPool-Worker-7] | 2015-11-02 19:03:19.867000 | 172.16.0.1 |            135
                                                 reading data from /172.16.0.8 [SharedPool-Worker-7] | 2015-11-02 19:03:19.868000 | 172.16.0.1 |            354
                         Sending READ message to /172.16.0.8 [MessagingService-Outgoing-/172.16.0.8] | 2015-11-02 19:03:19.868000 | 172.16.0.1 |            394
                      READ message received from /172.16.0.1 [MessagingService-Incoming-/172.16.0.1] | 2015-11-02 19:03:19.935000 | 172.16.0.8 |             43
                                      Executing single-partition query on app [SharedPool-Worker-18] | 2015-11-02 19:03:19.935000 | 172.16.0.8 |            479
                                                 Acquiring sstable references [SharedPool-Worker-18] | 2015-11-02 19:03:19.935000 | 172.16.0.8 |            500
                                                  Merging memtable tombstones [SharedPool-Worker-18] | 2015-11-02 19:03:19.935000 | 172.16.0.8 |            518
                                                  Key cache hit for sstable 5 [SharedPool-Worker-18] | 2015-11-02 19:03:19.935000 | 172.16.0.8 |            544
                                  Seeking to partition beginning in data file [SharedPool-Worker-18] | 2015-11-02 19:03:19.935000 | 172.16.0.8 |            560
    Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-18] | 2015-11-02 19:03:19.936000 | 172.16.0.8 |            747
                                   Merging data from memtables and 1 sstables [SharedPool-Worker-18] | 2015-11-02 19:03:19.936000 | 172.16.0.8 |            772
                                            Read 1 live and 0 tombstone cells [SharedPool-Worker-18] | 2015-11-02 19:03:19.936000 | 172.16.0.8 |            808
                                            Enqueuing response to /172.16.0.1 [SharedPool-Worker-18] | 2015-11-02 19:03:19.936000 | 172.16.0.8 |            873
             Sending REQUEST_RESPONSE message to /172.16.0.1 [MessagingService-Outgoing-/172.16.0.1] | 2015-11-02 19:03:19.936000 | 172.16.0.8 |           1119
          REQUEST_RESPONSE message received from /172.16.0.8 [MessagingService-Incoming-/172.16.0.8] | 2015-11-02 19:03:19.937000 | 172.16.0.1 |          70396
                                         Processing response from /172.16.0.8 [SharedPool-Worker-11] | 2015-11-02 19:03:19.937000 | 172.16.0.1 |          70431
                                                                                    Request complete | 2015-11-02 19:03:19.936484 | 172.16.0.1 |          70484

此数据存在于我连接的节点(172.16.0.1)上。

我刚才注意到的另一个奇怪的事情:新节点(上周末添加)已经拥有比重载节点更多的数据(3.5倍),即使它已经在近10天后被添加到集群中。

0 个答案:

没有答案