从两个数据帧(df1和df2)开始,我必须通过在'COD'列上进行合并并为新的'DELTA'列加电来构建另一个(df3),该列包含与“ COD”以及所有具有相同“ COD”的所有第二数据帧。
import pandas as pd
df1 = pd.DataFrame({
'COD': ['cod1', 'cod2', 'cod2', 'cod1', 'cod3', 'cod2'],
'DATE_1': ['30-01-2019', '22-01-2019', '30-08-2019', '22-01-2019', '01-01-2019', '30-01-2019']})
df2 =pd.DataFrame({
'COD': ['cod1', 'cod1', 'cod1', 'cod2', 'cod3', 'cod2', 'cod1'],
'DATE_2': ['24-01-2019', '21-01-2019', '02-08-2019', '03-01-2019', '30-01-2019', '22-01-2019', '30-01-2019']})
df1['DATE_1'] = pd.to_datetime(df1['DATE_1'])
df2['DATE_2'] = pd.to_datetime(df2['DATE_2'])
预期:
COD DATE_1 DELTA_min
0 cod1 30-01-2019 6
1 cod2 22-01-2019 0
2 cod2 30-08-2019 239
3 cod1 22-01-2019 2
4 cod3 01-01-2019 29
5 cod2 30-01-2019 8
答案 0 :(得分:1)
在COD上合并两个数据框(您可能需要在此处左连接)。创建一个新列DELTA和groupby。
import pandas as pd
df1 = pd.DataFrame({
'COD': ['cod1', 'cod2', 'cod2', 'cod1', 'cod3', 'cod2'],
'DATE_1': ['30-01-2019', '22-01-2019', '30-08-2019', '22-01-2019', '01-01-2019', '30-01-2019']})
df2 =pd.DataFrame({
'COD': ['cod1', 'cod1', 'cod1', 'cod2', 'cod3', 'cod2', 'cod1'],
'DATE_2': ['24-01-2019', '21-01-2019', '02-08-2019', '03-01-2019', '30-01-2019', '22-01-2019', '30-01-2019']})
df1['DATE_1'] = pd.to_datetime(df1['DATE_1'])
df2['DATE_2'] = pd.to_datetime(df2['DATE_2'])
df3 = df1.merge(df2, on='COD')
df3['DELTA'] = abs(df3.DATE_1 - df3.DATE_2)
df3.groupby(['COD', 'DATE_1']).DELTA.min()
我得到以下信息:
COD DATE_1
cod1 2019-01-22 1 days
2019-01-30 0 days
cod2 2019-01-22 0 days
2019-01-30 8 days
2019-08-30 182 days
cod3 2019-01-01 29 days
答案 1 :(得分:1)
首先将参数sam-ba
添加到to_datetime
,然后添加merge
,减去并转换为days
到abs
,最后聚合dayfirs=True
:< / p>
min
df1['DATE_1'] = pd.to_datetime(df1['DATE_1'], dayfirst=True)
df2['DATE_2'] = pd.to_datetime(df2['DATE_2'], dayfirst=True)
df = pd.merge(df1, df2, on=['COD'])
df['DELTA_min'] = (df['DATE_2'] - df['DATE_1']).dt.days.abs()
print (df)
COD DATE_1 DATE_2 DELTA_min
0 cod1 2019-01-30 2019-01-24 6
1 cod1 2019-01-30 2019-01-21 9
2 cod1 2019-01-30 2019-08-02 184
3 cod1 2019-01-30 2019-01-30 0
4 cod1 2019-01-22 2019-01-24 2
5 cod1 2019-01-22 2019-01-21 1
6 cod1 2019-01-22 2019-08-02 192
7 cod1 2019-01-22 2019-01-30 8
8 cod2 2019-01-22 2019-01-03 19
9 cod2 2019-01-22 2019-01-22 0
10 cod2 2019-08-30 2019-01-03 239
11 cod2 2019-08-30 2019-01-22 220
12 cod2 2019-01-30 2019-01-03 27
13 cod2 2019-01-30 2019-01-22 8
14 cod3 2019-01-01 2019-01-30 29
如果最终订单很重要:
df = df.groupby(['COD', 'DATE_1'], as_index=False)['DELTA_min'].min()
print (df)
COD DATE_1 DELTA_min
0 cod1 2019-01-22 1
1 cod1 2019-01-30 0
2 cod2 2019-01-22 0
3 cod2 2019-01-30 8
4 cod2 2019-08-30 220
5 cod3 2019-01-01 29
df1['DATE_1'] = pd.to_datetime(df1['DATE_1'], dayfirst=True)
df2['DATE_2'] = pd.to_datetime(df2['DATE_2'], dayfirst=True)
df = pd.merge(df1.reset_index(), df2, on=['COD'])
df['DELTA_min'] = (df['DATE_2'] - df['DATE_1']).dt.days.abs()
print (df)
index COD DATE_1 DATE_2 DELTA_min
0 0 cod1 2019-01-30 2019-01-24 6
1 0 cod1 2019-01-30 2019-01-21 9
2 0 cod1 2019-01-30 2019-08-02 184
3 0 cod1 2019-01-30 2019-01-30 0
4 3 cod1 2019-01-22 2019-01-24 2
5 3 cod1 2019-01-22 2019-01-21 1
6 3 cod1 2019-01-22 2019-08-02 192
7 3 cod1 2019-01-22 2019-01-30 8
8 1 cod2 2019-01-22 2019-01-03 19
9 1 cod2 2019-01-22 2019-01-22 0
10 2 cod2 2019-08-30 2019-01-03 239
11 2 cod2 2019-08-30 2019-01-22 220
12 5 cod2 2019-01-30 2019-01-03 27
13 5 cod2 2019-01-30 2019-01-22 8
14 4 cod3 2019-01-01 2019-01-30 29