查询日期框架数据框架熊猫之间的差异

时间:2019-02-21 09:39:25

标签: pandas date select calculated-columns

从两个数据帧(df1和df2)开始,我必须通过在'COD'列上进行合并并为新的'DELTA'列加电来构建另一个(df3),该列包含与“ COD”以及所有具有相同“ COD”的所有第二数据帧。

import pandas as pd

df1 = pd.DataFrame({
'COD': ['cod1', 'cod2', 'cod2', 'cod1', 'cod3', 'cod2'],
'DATE_1': ['30-01-2019', '22-01-2019', '30-08-2019', '22-01-2019', '01-01-2019', '30-01-2019']})


df2 =pd.DataFrame({
'COD': ['cod1', 'cod1', 'cod1', 'cod2', 'cod3', 'cod2', 'cod1'],
'DATE_2': ['24-01-2019', '21-01-2019', '02-08-2019', '03-01-2019', '30-01-2019', '22-01-2019', '30-01-2019']})

df1['DATE_1'] = pd.to_datetime(df1['DATE_1'])
df2['DATE_2'] = pd.to_datetime(df2['DATE_2'])

预期:

    COD      DATE_1  DELTA_min
0  cod1  30-01-2019          6
1  cod2  22-01-2019          0
2  cod2  30-08-2019        239
3  cod1  22-01-2019          2
4  cod3  01-01-2019         29
5  cod2  30-01-2019          8

2 个答案:

答案 0 :(得分:1)

在COD上合并两个数据框(您可能需要在此处左连接)。创建一个新列DELTA和groupby。

import pandas as pd

df1 = pd.DataFrame({
'COD': ['cod1', 'cod2', 'cod2', 'cod1', 'cod3', 'cod2'],
'DATE_1': ['30-01-2019', '22-01-2019', '30-08-2019', '22-01-2019', '01-01-2019', '30-01-2019']})


df2 =pd.DataFrame({
'COD': ['cod1', 'cod1', 'cod1', 'cod2', 'cod3', 'cod2', 'cod1'],
'DATE_2': ['24-01-2019', '21-01-2019', '02-08-2019', '03-01-2019', '30-01-2019', '22-01-2019', '30-01-2019']})

df1['DATE_1'] = pd.to_datetime(df1['DATE_1'])
df2['DATE_2'] = pd.to_datetime(df2['DATE_2'])

df3 = df1.merge(df2, on='COD')
df3['DELTA'] = abs(df3.DATE_1 - df3.DATE_2)
df3.groupby(['COD', 'DATE_1']).DELTA.min()

我得到以下信息:

COD   DATE_1    
cod1  2019-01-22     1 days
      2019-01-30     0 days
cod2  2019-01-22     0 days
      2019-01-30     8 days
      2019-08-30   182 days
cod3  2019-01-01    29 days

答案 1 :(得分:1)

首先将参数sam-ba添加到to_datetime,然后添加merge,减去并转换为daysabs,最后聚合dayfirs=True:< / p>

min

df1['DATE_1'] = pd.to_datetime(df1['DATE_1'], dayfirst=True)
df2['DATE_2'] = pd.to_datetime(df2['DATE_2'], dayfirst=True)

df = pd.merge(df1, df2, on=['COD'])
df['DELTA_min'] = (df['DATE_2'] - df['DATE_1']).dt.days.abs()

print (df)
     COD     DATE_1     DATE_2  DELTA_min
0   cod1 2019-01-30 2019-01-24          6
1   cod1 2019-01-30 2019-01-21          9
2   cod1 2019-01-30 2019-08-02        184
3   cod1 2019-01-30 2019-01-30          0
4   cod1 2019-01-22 2019-01-24          2
5   cod1 2019-01-22 2019-01-21          1
6   cod1 2019-01-22 2019-08-02        192
7   cod1 2019-01-22 2019-01-30          8
8   cod2 2019-01-22 2019-01-03         19
9   cod2 2019-01-22 2019-01-22          0
10  cod2 2019-08-30 2019-01-03        239
11  cod2 2019-08-30 2019-01-22        220
12  cod2 2019-01-30 2019-01-03         27
13  cod2 2019-01-30 2019-01-22          8
14  cod3 2019-01-01 2019-01-30         29

如果最终订单很重要:

df = df.groupby(['COD', 'DATE_1'], as_index=False)['DELTA_min'].min()
print (df)
    COD     DATE_1  DELTA_min
0  cod1 2019-01-22          1
1  cod1 2019-01-30          0
2  cod2 2019-01-22          0
3  cod2 2019-01-30          8
4  cod2 2019-08-30        220
5  cod3 2019-01-01         29

df1['DATE_1'] = pd.to_datetime(df1['DATE_1'], dayfirst=True)
df2['DATE_2'] = pd.to_datetime(df2['DATE_2'], dayfirst=True)

df = pd.merge(df1.reset_index(), df2, on=['COD'])
df['DELTA_min'] = (df['DATE_2'] - df['DATE_1']).dt.days.abs()

print (df)
    index   COD     DATE_1     DATE_2  DELTA_min
0       0  cod1 2019-01-30 2019-01-24          6
1       0  cod1 2019-01-30 2019-01-21          9
2       0  cod1 2019-01-30 2019-08-02        184
3       0  cod1 2019-01-30 2019-01-30          0
4       3  cod1 2019-01-22 2019-01-24          2
5       3  cod1 2019-01-22 2019-01-21          1
6       3  cod1 2019-01-22 2019-08-02        192
7       3  cod1 2019-01-22 2019-01-30          8
8       1  cod2 2019-01-22 2019-01-03         19
9       1  cod2 2019-01-22 2019-01-22          0
10      2  cod2 2019-08-30 2019-01-03        239
11      2  cod2 2019-08-30 2019-01-22        220
12      5  cod2 2019-01-30 2019-01-03         27
13      5  cod2 2019-01-30 2019-01-22          8
14      4  cod3 2019-01-01 2019-01-30         29