Python:异步Cassandra插入

时间:2014-03-13 13:50:04

标签: python cassandra datastax

Cassandra Python驱动程序的问题是返回的“future”对象通过副作用添加回调。意味着“未来”的组合性与Javascript或Scala中的Future可组合的意义相同。我想知道是否有一种模式可用于将不可组合的未来转变为可组合的未来(最好没有泄漏记忆。)

   my_query_object.insert(1, 2, 3, 'Fred Flinstone')
     .insert(1, 2, 3, 'Barney Rubble')
     .insert(5000, 2, 3, 'George Jetson')
     .insert(5000, 2, 3, 'Jane his wife')

从Datastax查看Cassandra Python驱动程序的performance部分,我看到了他们如何创建一系列可连接的插入查询的示例。即该模式的稍微复杂的版本:

def insert_next(previous_result=sentinel):
    if previous_result is not sentinel:
        if isinstance(previous_result, BaseException):
            log.error("Error on insert: %r", previous_result)

    future = session.execute_async(query)
    # NOTE: this callback also handles errors
    future.add_callbacks(insert_next, insert_next)

作为玩具示例非常有用。完成一分钟查询后,再次执行另一个等效查询。这种方案允许它们实现7k写入/秒,而不试图“链接”回调的版本大约2k写入/秒。

我一直试图让我的头脑创造某种机制,让我重新获得这种确切的机制,但无济于事。有人想出类似的东西吗?

1 个答案:

答案 0 :(得分:1)

我想了一下如何以某种形式保留未来:

import logging
from Queue import Queue #queue in python 3
from threading import Event #hmm... this needed?


insert_logger = logging.getLogger('async_insert')
insert_logger.setLevel(logging.INFO)

def handle_err(err):
  insert_logger.warning('Failed to insert due to %s', err)


#Designed to work in a high write environment. Chained callbacks for best performance and fast fail/stop when error
#encountered. Next insert should re-up the writing. Potential loss of failed write. Some guarantee on order of write
#preservation.
class CappedQueueInserter(object):
  def __init__(self, session, max_count=0):
    self.__queue = Queue(max_count)
    self.__session = session
    self.__started = Event()

  @property
  def started(self):
    return self.__started.is_set()

  def insert(self, bound_statement):
    if not self.started:
      self._begin(bound_statement)
    else:
      self._enqueue(bound_statement)

  def _begin(self, bound_statement):
    def callback():
      try:
        bound = self.__queue.get(True) #block until an item is added to the queue
        future = self.__session.execute_async(bound)
        future.add_callbacks(callback, handle_err)
      except:
        self.__started.clear()

    self.__started.set()
    future = self.__session.execute_async(bound_statement)
    future.add_callbacks(callback, handle_err)

  def _enqueue(self, bound_statement):
    self.__queue.put(bound_statement, True)


#Separate insert statement binding from the insertion loop
class InsertEnqueue(object):
  def __init__(self, prepared_query, insert, consistency_level=None):
    self.__statement = prepared_query
    self.__level = consistency_level
    self.__sink = insert

  def insert(self, *args):
    bound = self.bind(*args)
    self.__sink.insert(bound)

  @property
  def consistency_level(self):
    return self.__level or self.__statement.consistency_level

  @consistency_level.setter
  def adjust_level(self, value):
    if value:
      self.__level = value

  def bind(self, *args):
    bound = self.__statement.bind(*args)
    bound.consistency_level = self.consistency_level

    return bound

QueueEvent的组合以触发事物。假设写入可以“最终”发生,这应该可行。