我正在尝试对我的Cassandra连接进行多处理。
我使用多处理队列在更简单的情况下使用队列进行多处理(将子进程传递给一堆数字并获得结果)。
在我当前的DataGetter中,我导入了一个Cassandra工作类。 Python Multiprocessing是否仅使用以前导入的对象有任何问题?
以下是我的DataGetter中的相关代码:
def read_data_multi(self, cass_worker, work_queue, done_queue):
#cass_worker = cbcassandra.CBcassandra(self.chost, self.keyspace)
cass_worker.open_cur()
for inq in iter(work_queue.get, 'STOP'):
data = self.read_data(cass_worker, inq[0], inq[1], inq[2], inq[3])
print data
done_queue.put(data)
cass.close_cur()
return True
def multi_get(self, readtype, dcname, vmname, timebucket_list):
workers = 2
work_queue = Queue()
done_queue = Queue()
processes = []
for tb in timebucket_list:
inq = (readtype, dcname, vmname, tb)
print inq
work_queue.put(inq)
for w in xrange(workers):
cass_worker = cbcassandra.CBcassandra(self.chost, self.keyspace)
p = Process(target=self.read_data_multi, args=(cass_worker, work_queue, done_queue))
p.start()
processes.append(p)
work_queue.put('STOP')
print processes
for p in processes:
p.join()
done_queue.put('STOP')
return done_queue
当我不使用多处理时, read_data
完美无缺。
这是我使用多处理时的输出。我的流程开始了,但他们无法建立连接:
[<Process(Process-1, started)>, <Process(Process-2, started)>]
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "preparedata.py", line 251, in read_data_multi
cass_worker.open_cur()
File "/root/cbcassandra.py", line 40, in open_cur
cluster, cur = self.getclustsess(self.keyspace)
File "/root/cbcassandra.py", line 33, in getclustsess
session = cluster.connect(keyspace)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 471, in connect
self.control_connection.connect()
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1355, in connect
self._set_new_connection(self._reconnect_internal())
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1390, in _reconnect_internal
raise NoHostAvailable("Unable to connect to any servers", errors)
NoHostAvailable: ('Unable to connect to any servers', {'104.130.65.178': OperationTimedOut('errors=Timed out creating connection, last_host=None',)})
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "preparedata.py", line 251, in read_data_multi
cass_worker.open_cur()
File "/root/cbcassandra.py", line 40, in open_cur
cluster, cur = self.getclustsess(self.keyspace)
File "/root/cbcassandra.py", line 33, in getclustsess
session = cluster.connect(keyspace)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 471, in connect
self.control_connection.connect()
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1355, in connect
self._set_new_connection(self._reconnect_internal())
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1390, in _reconnect_internal
raise NoHostAvailable("Unable to connect to any servers", errors)
NoHostAvailable: ('Unable to connect to any servers', {'104.130.65.178': OperationTimedOut('errors=Timed out creating connection, last_host=None',)})
答案 0 :(得分:0)
我今天刚刚发现,使用cassandra进行多处理需要在新进程启动之前关闭cassandra会话/集群,这与多进程中的django.db连接类似。
例如:
cass_worker.session.cluster.shutdown()
cass_worker.session.shutdown()
cass_worker = cbcassandra.CBcassandra(self.chost, self.keyspace)