如何处理AllServersUnavailable Exception

时间:2013-04-09 15:47:46

标签: python cassandra pycassa

我想在单个节点上对Cassandra实例(v1.1.10)执行简单的写操作。我只想看看它如何处理持续写入以及它是否能跟上写入速度。

pool = ConnectionPool('testdb')
test_cf = ColumnFamily(pool,'test')
test2_cf = ColumnFamily(pool,'test2')
test3_cf = ColumnFamily(pool,'test3')
test_batch = test_cf.batch(queue_size=1000)
test2_batch = test2_cf.batch(queue_size=1000)
test3_batch = test3_cf.batch(queue_size=1000)

chars=string.ascii_uppercase
counter = 0
while True:
    counter += 1
    uid = uuid.uuid1()
    junk = ''.join(random.choice(chars) for x in range(50))
    test_batch.insert(uid, {'junk':junk})
    test2_batch.insert(uid, {'junk':junk})
    test3_batch.insert(uid, {'junk':junk})
    sys.stdout.write(str(counter)+'\n')

pool.dispose()

代码在长时间写入后(当计数器大约10M +时)保持压缩,并显示以下消息

pycassa.pool.AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was timeout: timed out

我设置的queue_size=100没有帮助。此外,我在脚本崩溃后启动了cqlsh -3控制台以截断表,并收到以下错误:

Unable to complete request: one or more nodes were unavailable.

Tailing /var/log/cassandra/system.log没有错误标志,但在压缩,FlushWriter等上没有INFO。我究竟做错了什么?

1 个答案:

答案 0 :(得分:0)

我也遇到过这个问题 - 正如@ tyler-hobbs在评论中建议节点可能超载(这对我而言)。我使用的一个简单修复是退避并让节点赶上来。我已经重写了上面的循环以捕获错误,睡了一会儿然后再试一次。我针对单个节点集群运行此操作,它可以处理 - 暂停(一分钟)并定期退出(连续不超过5次)。使用此脚本不会遗漏任何数据,除非错误连续丢失五次(在这种情况下,您可能想要努力失败而不是返回循环)。

while True:
  counter += 1
  uid = uuid.uuid1()
  junk = ''.join(random.choice(chars) for x in range(50))
  tryCount = 5 # 5 is probably unnecessarily high
  while tryCount > 0:
    try:
      test_batch.insert(uid, {'junk':junk})
      test2_batch.insert(uid, {'junk':junk})
      test3_batch.insert(uid, {'junk':junk})
      tryCount = -1
    except pycassa.pool.AllServersUnavailable as e:
      print "Trying to insert [" + str(uid) + "] but got error " + str(e) + " (attempt " + str(tryCount) + "). Backing off for a minute to let Cassandra settle down"
      time.sleep(60) # A delay of 60s is probably unnecessarily high
      tryCount = tryCount - 1
  sys.stdout.write(str(counter)+'\n')

我添加了a complete gist here