我使用Phantom-DSL和Datastax Cassandra驱动程序获得Cassandra超时。但是,Cassandra似乎没有超载。以下是我得到的例外情况:
com.datastax.driver.core.exceptions.OperationTimedOutException: [node-0.cassandra.dev/10.0.1.137:9042] Timed out waiting for server response
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:766)
at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1267)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:588)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:662)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:385)
at java.lang.Thread.run(Thread.java:745)
以下是我在这段时间内从Cassandra Datadog连接器获得的统计数据:
您可以在顶部中心图表上看到我们的读取率(每秒)。我们的CPU和内存使用率非常低。
以下是我们配置Datastax驱动程序的方法:
val points = ContactPoints(config.cassandraHosts)
.withClusterBuilder(_.withSocketOptions(
new SocketOptions()
.setReadTimeoutMillis(config.cassandraNodeTimeout)
))
.withClusterBuilder(_.withPoolingOptions(
new PoolingOptions()
.setConnectionsPerHost(
HostDistance.LOCAL,
2,
2
)
.setConnectionsPerHost(
HostDistance.REMOTE,
2,
2
)
.setMaxRequestsPerConnection(
HostDistance.LOCAL,
2048
)
.setMaxRequestsPerConnection(
HostDistance.REMOTE,
2048
)
.setPoolTimeoutMillis(10000)
.setNewConnectionThreshold(
HostDistance.LOCAL,
1500
)
.setNewConnectionThreshold(
HostDistance.REMOTE,
1500
)
))
我们的nodetool cfstats
看起来像这样:
$ nodetool cfstats alexandria_dev.match_sums
Keyspace : alexandria_dev
Read Count: 101892
Read Latency: 0.007479115141522397 ms.
Write Count: 18721
Write Latency: 0.012341060840767052 ms.
Pending Flushes: 0
Table: match_sums
SSTable count: 0
Space used (live): 0
Space used (total): 0
Space used by snapshots (total): 0
Off heap memory used (total): 0
SSTable Compression Ratio: 0.0
Number of keys (estimate): 15328
Memtable cell count: 15332
Memtable data size: 21477107
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 17959
Local read latency: 0.015 ms
Local write count: 15332
Local write latency: 0.013 ms
Pending flushes: 0
Percent repaired: 100.0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 0
Bloom filter off heap memory used: 0
Index summary off heap memory used: 0
Compression metadata off heap memory used: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0
当我们运行cassandra-stress
时,我们没有遇到任何问题:我们每秒获得稳定的5万次读取,符合预期。
每当我提出疑问时,Cassandra都会出现此错误:
INFO [Native-Transport-Requests-2] 2017-03-10 23:59:38,003 Message.java:611 - Unexpected exception during request; channel = [id: 0x65d7a0cd, L:/10.0.1.98:9042 ! R:/10.0.1.126:35536]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.39.Final.jar:4.0.39.Final]
为什么我们会超时?
编辑:我上传了错误的信息中心。请看新图片。
答案 0 :(得分:0)
2个有用的问题:
现在澄清一下我认为你在哪里出错:
答案 1 :(得分:0)
我建议跟踪有问题的查询,看看cassandra在做什么。
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tracing_r.html
打开cql shell,输入TRACING ON
并执行查询。如果一切似乎都很好,偶尔会出现这个问题,在这种情况下,我建议使用nodetool settraceprobablilty跟踪查询一段时间,直到你设法解决问题。
使用nodetool settraceprobability <param>
分别在每个节点上启用它,其中param是查询将被跟踪的概率(在0和1之间)。小心:这会导致负载增加,所以从非常低的数字开始然后上升。
如果偶尔出现此问题,则可能是由于长垃圾收集导致的,在这种情况下,您需要分析GC日志。检查您的GC有多长。
编辑:只是为了清楚,如果这个问题是由GC造成的,那么你就不会看到跟踪了。因此,首先检查您的GC,如果不是问题,那么继续进行跟踪。