以下代码在Windows中有效,但在Linux中已挂起:
from impala.dbapi import connect
from multiprocessing import Pool
conn = connect(host='172.16.12.12', port=10000, user='hive', password='hive', database='test',auth_mechanism='PLAIN')
cur = conn.cursor()
def test_hive(a):
cur.execute('select {}'.format(a))
tab_cc = cur.fetchall()
tab_cc =tab_cc[0][0]
print(a,tab_cc)
if __name__ == '__main__':
pool = Pool(processes=8)
alist=[1,2,3]
for i in range(len(alist)):
pool.apply_async(test_hive,str(i))
pool.close()
pool.join()
当我将alist=[1,2,3]
更改为alist=[1]
时,它可以在Linux中使用。
答案 0 :(得分:2)
我看到此行为的两个可能原因:
test_hive
中引发的异常fork
不会从父级复制线程和/或在执行fork调用时复制互斥体的事实导致死锁要检查异常,请在return tab_cc
函数的末尾添加test_hive
并收集池返回的结果:
if __name__ == '__main__':
pool = Pool(processes=8)
alist = [1,2,3]
results = []
for i in range(len(alist)):
results.append(pool.apply_async(test_hive, str(i)))
pool.close()
pool.join()
for result in results:
try:
print(result.get())
except Exception as e:
print("{}: {}".format(type(e).__name__, e))
对于线程,我在impala
存储库中进行了快速搜索,看来它们在thrift
的使用中起了某种作用。我不确定从该库发起时,Python的线程模块是否可以实际看到它们。您可以在模块级别(例如,在print(multiprocessing.current_process(), threading.enumerate())
之后)和cur = conn.cursor()
函数的开头尝试使用test_hive
,并查看_MainProcess(MainProcess, started)
是否显示更长的列表。活动线程,而不是所有ForkProcess(ForkPoolWorker-<worker#>, started daemon)
。
对于一个潜在的解决方案:我有点怀疑您在模块级别创建conn
和cur
的事实;所有孩子都使用这两个孩子的副本。
尝试将这两行移动到test_hive
的开头,以便每个进程创建一个连接和一个游标(如果有的话):
conn = connect(host='172.16.12.12', port=10000, user='hive', password='hive', database='test',auth_mechanism='PLAIN')
cur = conn.cursor()
答案 1 :(得分:0)
from impala.dbapi import connect
import time,datetime,sys,re
import psycopg2 as pg
today = datetime.date.today()
from multiprocessing import Pool
def test_hive(a):
conn = connect(host='172.16.12.12', port=10000, user='hive', password='hive', database='test',auth_mechanism='PLAIN')
cur = conn.cursor()
#print(a)
cur.execute('select {}'.format(a))
tab_cc = cur.fetchall()
tab_cc =tab_cc[0][0]
return tab_cc
if __name__ == '__main__':
pool = Pool(processes=8)
alist = [1,2,4,4,4,4,5,3]
results = []
for i in range(len(alist)):
results.append(pool.apply_async(test_hive, str(i)))
pool.close()
pool.join()
for result in results:
try:
print(result.get())
except Exception as e:
print("{}: {}".format(type(e).__name__, e))
我将这两行移动到test_hive上。
conn = connect(host='172.16.12.12', port=10000, user='hive', password='hive', database='test',auth_mechanism='PLAIN')
cur = conn.cursor()