Question

我有2个数据帧，df1和df2，两者都有相同的格式。

例如，df1看起来像这样：

      Date  A   B   C   D   E
2018-03-01  1  40  30  30  70
2018-03-02  3  60  70  50  55
2018-03-03  4  60  70  45  80
2018-03-04  5  80  90  30  47
2018-03-05  3  40  40  37  20

df2 可能看起来像这样：唯一的区别是开始日期

      Date  A   B   C   D   E
2018-03-03  4  60  70  45  80
2018-03-04  5  80  90  30  47
2018-03-05  3  40  40  37  20
2018-03-06  7  55  26  46  42
2018-03-07  2  73  46  33  25

我想将df2中的所有行追加到df1，在本例中是2018-03-06中的所有行，以便df1变为：

      Date  A   B   C   D   E
2018-03-01  1  40  30  30  70
2018-03-02  3  60  70  50  55
2018-03-03  4  60  70  45  80
2018-03-04  5  80  90  30  47
2018-03-05  3  40  40  37  20
2018-03-06  7  55  26  46  42
2018-03-07  2  73  46  33  25

注意：df2可能会跳过2018-03-06，因此如果属于这种情况，则会复制并附加2018-03-07中的所有行。

我的df [＆＃39;日期＆＃39;]的dtype是datetime64。当我尝试索引df1的last_date以找到要从df2复制的next_date时出错。

>>>> last_date = df1['Date'].tail(1)
>>>> next_date = datetime.datetime(last_date) + datetime.timedelta(days=1)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp'

或者，如何复制df2中的所有行（从df1的最后一个日期之后的日期开始）并将它们附加到df1？感谢。

Answer 1

选项1
在combine_first列上使用Date：

i = df1.set_index('Date')
j = df2[df2.Date.gt(df1.Date.max())].set_index('Date')

i.combine_first(j).reset_index()

         Date    A     B     C     D     E
0  2018-03-01  1.0  40.0  30.0  30.0  70.0
1  2018-03-02  3.0  60.0  70.0  50.0  55.0
2  2018-03-03  4.0  60.0  70.0  45.0  80.0
3  2018-03-04  5.0  80.0  90.0  30.0  47.0
4  2018-03-05  3.0  40.0  40.0  37.0  20.0
5  2018-03-06  7.0  55.0  26.0  46.0  42.0
6  2018-03-07  2.0  73.0  46.0  33.0  25.0

选项2
concat + groupby

pd.concat([i, j]).groupby('Date').first().reset_index()

         Date  A   B   C   D   E
0  2018-03-01  1  40  30  30  70
1  2018-03-02  3  60  70  50  55
2  2018-03-03  4  60  70  45  80
3  2018-03-04  5  80  90  30  47
4  2018-03-05  3  40  40  37  20
5  2018-03-06  7  55  26  46  42
6  2018-03-07  2  73  46  33  25

根据日期时间在df2中将多行添加到df1

1 个答案: