Question

我需要读取表中的所有行（超过一百万）。我已经读过关于分页的问题（http://www.datastax.com/dev/blog/datastax-python-driver-2-0-released） - 没什么帮助。代码很简单：

...
retry = RetryPolicy()
retry.RETRY = 10
cluster = Cluster(
[ ... ],
reconnection_policy=ConstantReconnectionPolicy(5.0, 100),
auth_provider=auth_provider,
load_balancing_policy=RoundRobinPolicy(),
default_retry_policy=retry,
port=9042)
session = cluster.connect("test")
session.default_timeout = 9999
session.default_fetch_size = 1000

...
...

uname_stmt = SimpleStatement(q, fetch_size=100)
uname_stmt.consistency_level = ConsistencyLevel.ONE

for row in session.execute(uname_stmt):
  ...

基本上大约5分钟后（可能是1分钟或者可能是10分钟），最后一个for循环会触发此错误：

Traceback (most recent call last):
File "test.py", line 67, in <module>
for row in session.execute(uname_stmt):
File "/usr/lib/python2.6/site-packages/cassandra/cluster.py", line 2939, in next
result = self.response_future.result(self.timeout)
File "/usr/lib/python2.6/site-packages/cassandra/cluster.py", line 2771, in result
raise self._final_exception
cassandra.ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'data_retrieved': False, 'required_responses': 1, 'consistency': 1}

任何帮助都会很棒！谢谢！

Answer 1

这可能是因为cassandra试图重新组织所有SSTable。

这就是为什么在许多SSTable上发生读出操作并且超时的原因。

Cassandra使用压缩来管理磁盘上SSTables的累积。

尝试使用compact命令可能有帮助。

nodetool compact

Python 2.6 Cassandra 2.0.1 ReadTimeout

1 个答案: