Question

我现在已经坚持这个问题差不多一个星期了。我想得到你的建议和帮助。我也一直在为简单的表读取延迟问题。我刚刚用4k行创建了简单的表格，当我读取500行时，它在5ms内取出，但如果我增加1000，如果取4k它就会得到~10ms约50ms。我试过检查统计数据，网络，iostat，tpstats，堆但是无法弄清楚问题是什么。任何人都可以帮助我解决分配给我的这个高优先级问题。非常感谢你提前。

Tracing session: b4287090-0ea5-11e5-a9f9-bbcaf44e5ebc

 activity                                                                                                                    | timestamp                  | source        | source_elapsed
-----------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                                                          Execute CQL3 query | 2015-06-09 07:47:35.961000 | 10.65.133.202 |              0
                                                 Parsing select * from location_eligibility_by_type12; [SharedPool-Worker-1] | 2015-06-09 07:47:35.961000 | 10.65.133.202 |             33
                                                                                   Preparing statement [SharedPool-Worker-1] | 2015-06-09 07:47:35.962000 | 10.65.133.202 |             62
                                                                             Computing ranges to query [SharedPool-Worker-1] | 2015-06-09 07:47:35.962000 | 10.65.133.202 |            101
 Submitting range requests on 1537 ranges with a concurrency of 1537 (1235.85 rows per range expected) [SharedPool-Worker-1] | 2015-06-09 07:47:35.962000 | 10.65.133.202 |            314
                                            Submitted 1 concurrent range requests covering 1537 ranges [SharedPool-Worker-1] | 2015-06-09 07:47:35.968000 | 10.65.133.202 |           6960
       Executing seq scan across 1 sstables for [min(-9223372036854775808), min(-9223372036854775808)] [SharedPool-Worker-2] | 2015-06-09 07:47:35.968000 | 10.65.133.202 |           7033
                                                                 Read 4007 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-09 07:47:36.045000 | 10.65.133.202 |          84055
                                                                          Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-06-09 07:47:36.046000 | 10.65.133.202 |          84109
                                                                                                            Request complete | 2015-06-09 07:47:36.052498 | 10.65.133.202 |          91498

Answer 1

Selecting lots of rows in Cassandra often takes unpredictably long since the query will be routed to more machines.

It's best to avoid such schemas if you need high read performance. A better approach is to store data in a single row and spread the load between nodes by having a higher replication factor. Wide rows are generally preferable: http://www.slideshare.net/planetcassandra/cassandra-summit-2014-39677149

每100行读取时间增加1ms

1 个答案: