使用python多处理插入到cassandra db中

时间:2018-04-10 14:30:39

标签: python cassandra cassandra-3.0

我是python和cassandra的新手。 我试图在cassandra中使用python multiproccessing,我在这个网站上得到了它 https://github.com/aholmberg/driver-multiprocessing/blob/py3/multiprocess_execute.py 我该如何修复错误,请告诉我是否有任何变更我必须申请。 这是我的代码:

from multiprocessing import Pool
import sys
import time
from cassandra.cluster import Cluster
from cassandra.query import tuple_factory

def query_gen(n):
    for _ in range(n):
        yield ('local', )


class QueryManager(object):

    batch_size = 10

    def __init__( self , cluster , process_count = None ):
        self.pool = Pool(processes=process_count, initializer=self._setup,          initargs=(cluster,))

    @classmethod
    def _setup(cls, cluster):
        cls.session = cluster.connect()
        cls.session = cluster.connect('new')


        cls.session.row_factory = tuple_factory
        cls.prepared = cls.session.prepare('SELECT * FROM new.mytbl')

    def close_pool( self ):
        self.pool.close()
        self.pool.join()

    def get_results(self, params):
        results = self.pool.map(_get_multiproc, params, self.batch_size)
        return results

    @classmethod
    def _execute_request(cls, params):
        return cls.session.execute(cls.prepared, params)

def _get_multiproc(params):
    return QueryManager._execute_request(params)


if __name__ == '__main__':
    try:
        iterations = 1
        processes = 2
    except (IndexError, ValueError):
        print("Usage: %s <num iterations> [<num processes>]" % 1)
        sys.exit(1)

    cluster = Cluster()
    cluster = Cluster(['127.0.0.1'])
    qm = QueryManager(cluster, processes)

    start = time.time()
    rows = qm.get_results(query_gen(iterations))
    delta = time.time() - start
#print("%d queries in %s seconds (%s/s)" % (iterations, delta, iterations / delta))

这是错误日志:

文件“multi.py”,第64行,in rows = m.get_results(query_gen(iterations))

get_results中的文件“multi.py”,第40行   results = self.pool.map(_get_multiproc,params,self.batch_size)

文件“/usr/lib/python2.7/multiprocessing/pool.py”,第251行,在地图中   return self.map_async(func,iterable,chunksize).get()

文件“/usr/lib/python2.7/multiprocessing/pool.py”,第567行,在get中       提升self._value

ValueError:提供给bind()的参数太多(得到1,预期为0)

1 个答案:

答案 0 :(得分:0)

我不确定你要完成的是什么,但在看了你的代码之后,我认为问题出在这里:

    @classmethod
    def _execute_request(cls, params):
    return cls.session.execute(cls.prepared, params)

session.execute(prepared_query)

由于您查询的只是一个没有任何参数的select语句,并且您将params传递给execute语句,它会向您显示一个错误,即太多的params(得到1预期0)

尝试将其更改为

return cls.session.execute(cls.prepared)

看看是否有效!! 阅读更多:here