使用Pandas DataFrame进行多重处理

时间:2018-08-24 11:56:46

标签: python pandas dataframe multiprocessing

我想对大熊猫数据框使用多重处理。我想基于另一个列值设置该数据框的列条目。这是使用一些if语句完成的简单标记。

这是我尝试过的最小示例:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import multiprocessing


def worker(data):
    '''worker function'''

    try:
        assert type(data) == pd.core.frame.DataFrame

        for i in data.index:
            if 0 < data['Value'].iloc[i] <=2:
                data['Label'].iloc[i] = 'low'
            elif 2 < data['Value'].iloc[i] <=4:
                data['Label'].iloc[i] = 'medium'
            elif 4 < data['Value'].iloc[i] <=6:
                data['Label'].iloc[i] = 'high'
            else:
                data['Label'].iloc[i] = 'very high'

    except AssertionError:
        print('Data has to be pandas df!')



if __name__ == '__main__':

    # dummy data set
    df = pd.DataFrame(np.random.randint(0,10,1001),columns=['Value'])
    df['Labels'] = 0
    num_cores = multiprocessing.cpu_count()

    splits = np.linspace(0,len(df),num_cores+1,dtype=int)

    jobs = []
    for i in range(num_cores):
        lower_bound = splits[i]
        upper_bound = splits[i+1]
        p = multiprocessing.Process(target=worker,  args=(df.iloc[lower_bound:upper_bound],))
        jobs.append(p)
        p.start()

    for proc in jobs:
       proc.join()
    print(jobs)

但是,当我运行它时,出现以下错误声明:

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-4:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-5:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-6:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-7:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
Process Process-8:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: worker() takes 1 positional argument but 2 were given
[<Process(Process-1, stopped[1])>, <Process(Process-2, stopped[1])>, <Process(Process-3, stopped[1])>, <Process(Process-4, stopped[1])>, <Process(Process-5, stopped[1])>, <Process(Process-6, stopped[1])>, <Process(Process-7, stopped[1])>, <Process(Process-8, stopped[1])>]

我不确定多重处理到底出了什么问题??

0 个答案:

没有答案