我想对大熊猫数据框使用多重处理。我想基于另一个列值设置该数据框的列条目。这是使用一些if语句完成的简单标记。
这是我尝试过的最小示例:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import multiprocessing
def worker(data):
'''worker function'''
try:
assert type(data) == pd.core.frame.DataFrame
for i in data.index:
if 0 < data['Value'].iloc[i] <=2:
data['Label'].iloc[i] = 'low'
elif 2 < data['Value'].iloc[i] <=4:
data['Label'].iloc[i] = 'medium'
elif 4 < data['Value'].iloc[i] <=6:
data['Label'].iloc[i] = 'high'
else:
data['Label'].iloc[i] = 'very high'
except AssertionError:
print('Data has to be pandas df!')
if __name__ == '__main__':
# dummy data set
df = pd.DataFrame(np.random.randint(0,10,1001),columns=['Value'])
df['Labels'] = 0
num_cores = multiprocessing.cpu_count()
splits = np.linspace(0,len(df),num_cores+1,dtype=int)
jobs = []
for i in range(num_cores):
lower_bound = splits[i]
upper_bound = splits[i+1]
p = multiprocessing.Process(target=worker, args=(df.iloc[lower_bound:upper_bound],))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
print(jobs)
但是,当我运行它时,出现以下错误声明:
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-5:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-6:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-7:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-8:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
[<Process(Process-1, stopped[1])>, <Process(Process-2, stopped[1])>, <Process(Process-3, stopped[1])>, <Process(Process-4, stopped[1])>, <Process(Process-5, stopped[1])>, <Process(Process-6, stopped[1])>, <Process(Process-7, stopped[1])>, <Process(Process-8, stopped[1])>]
我不确定多重处理到底出了什么问题??