我修改了代码以添加多处理。我的代码现在看起来像:
df = pd.read_csv('Dates.csv', parse_dates=True)
df['dates']=pd.to_datetime(df['dates'])
df['dates']=df['dates'].dt.date
path="Testordner"
os.chdir(path)
result = [i for i in glob.glob('*.{}'.format("csv"))]
os.chdir("..")
def f(i):
df2 = pd.read_csv("Testordner/"+i, parse_dates=True)
df2['time'] = pd.to_datetime(df2['time'])
df2['just_dates'] = df2['time'].dt.date
dates2 = df2['just_dates']
df['counts'+i]=df['dates'].isin(df2['just_dates']).astype(int)
pool = multiprocessing.Pool(multiprocessing.cpu_count())
pool.map(f, result)
但是什么也没有发生..当我打印df
时,就像以前一样。如何开始多处理?
df2['just_dates']
看起来像:
dates
0 2003-01-01
1 2003-01-02
2 2003-01-03
3 2003-01-04
4 2003-01-05
5 2003-01-06
6 2003-01-07
7 2003-01-08
8 2003-01-09
...
5287 2017-06-23
5288 2017-06-24
5289 2017-06-25
5290 2017-06-26
5291 2017-06-27
5292 2017-06-28
5293 2017-06-29
5294 2017-06-30
例如df2
0 2003-01-02
1 2015-10-31
2 2015-11-01
3 2015-11-01
4 2015-11-01
5 2015-11-01
6 2015-11-01
7 2015-11-01
8 2015-11-01
...
42 2015-11-03
43 2015-11-03
44 2015-11-04
45 2015-11-04
46 2015-11-04
这是一个示例文件的just_dates列。
答案 0 :(得分:0)
尝试:
from multiprocessing import Pool, cpu_count
import pandas as pd
def f(i):
df2 = pd.read_csv("Testordner/"+i, parse_dates=True)
df2['time'] = pd.to_datetime(df2['time'])
df2['just_dates'] = df2['time'].dt.date
dates2 = df2['just_dates']
return (i, df['dates'].isin(df2['just_dates']).astype(int))
with Pool(cpu_count()) as p:
results = p.map(f, result)