如何开始多处理过程?

时间:2019-06-12 13:20:53

标签: python pandas multiprocessing

我修改了代码以添加多处理。我的代码现在看起来像:

df = pd.read_csv('Dates.csv', parse_dates=True)
df['dates']=pd.to_datetime(df['dates'])
df['dates']=df['dates'].dt.date

path="Testordner"
os.chdir(path)
result = [i for i in glob.glob('*.{}'.format("csv"))]
os.chdir("..")

def f(i):
    df2 = pd.read_csv("Testordner/"+i, parse_dates=True)
    df2['time'] = pd.to_datetime(df2['time'])
    df2['just_dates'] = df2['time'].dt.date
    dates2 = df2['just_dates']
    df['counts'+i]=df['dates'].isin(df2['just_dates']).astype(int) 

pool = multiprocessing.Pool(multiprocessing.cpu_count())
pool.map(f, result)

但是什么也没有发生..当我打印df时,就像以前一样。如何开始多处理? df2['just_dates']看起来像:

     dates
0     2003-01-01
1     2003-01-02
2     2003-01-03
3     2003-01-04
4     2003-01-05
5     2003-01-06
6     2003-01-07
7     2003-01-08
8     2003-01-09
...
5287  2017-06-23
5288  2017-06-24
5289  2017-06-25
5290  2017-06-26
5291  2017-06-27
5292  2017-06-28
5293  2017-06-29
5294  2017-06-30

例如df2

0     2003-01-02
1     2015-10-31
2     2015-11-01
3     2015-11-01
4     2015-11-01
5     2015-11-01
6     2015-11-01
7     2015-11-01
8     2015-11-01
...
42    2015-11-03
43    2015-11-03
44    2015-11-04
45    2015-11-04
46    2015-11-04

这是一个示例文件的just_dates列。

1 个答案:

答案 0 :(得分:0)

尝试:

from multiprocessing import Pool, cpu_count
import pandas as pd


def f(i):
    df2 = pd.read_csv("Testordner/"+i, parse_dates=True)
    df2['time'] = pd.to_datetime(df2['time'])
    df2['just_dates'] = df2['time'].dt.date
    dates2 = df2['just_dates']
    return (i, df['dates'].isin(df2['just_dates']).astype(int))

with Pool(cpu_count()) as p:
    results = p.map(f, result)