Im将一种化学符号转换为另一种类型。我的列表中有超过6k个不同的名称要转换,而且需要很长时间。如何使用多重处理?我试图实现自己,但我是菜鸟。也欢迎其他代码优化!
我试图自己实现多处理,但我是菜鸟。
def resolve(str_input, representation):
import cirpy
return cirpy.resolve(str_input, representation)
compound_list = []
smiles_list = []
for index, row in df_Verteilung.iterrows():
try:
actual_smiles = resolve(row['Compound'], 'smiles')
except:
actual_smiles = 'Error'
print('\r', row['Compound'], actual_smiles, end='')
compound_list.append(row['Compound'])
smiles_list.append(actual_smiles)
df_new = pd.DataFrame({'Compound' : compound_list, 'SmilesCode' : smiles_list})
df_new.to_csv(index=False)
答案 0 :(得分:0)
尝试通过多处理使用池:
from multiprocessing import Pool
def resolve(str_input, representation):
try:
import cirpy
res = cirpy.resolve(str_input, representation)
except:
res = "Error"
print('\r', str_input, res, end='')
return (str_input, res)
n = 5
with Pool(processes=n) as pool:
compounds_smiles_list = pool.starmap(resolve, [(row['Compound'], 'smiles') for index, row in df_Verteilung.iterrows()])
compound_list = [elem[0] for elem in compounds_smiles_list]
smiles_list = [elem[1] for elem in compounds_smiles_list]
df_new = pd.DataFrame({'Compound' : compound_list, 'SmilesCode' : smiles_list})
df_new.to_csv(index=False)
使用变量n
,您可以控制池的大小。另外,您可以将Pool构造函数保留为空,然后根据您的系统选择最佳数量的工作程序。
一些解释: