比较两个pandas Dataframes的日期,如果日期相似则添加值?

时间:2016-05-30 15:49:07

标签: python datetime pandas dataframe

我有两个具有不同行数的数据帧,一个具有每日值,第二个具有每小时值。我想比较它们,如果日期匹配,那么我想在当天添加每小时价值的每日价值。数据框是;

import pandas as pd
df1 = pd.read_csv('C:\Users\ABC.csv')
df2 = pd.read_csv('C:\Users\DEF.csv')
df1 = pd.to_datetime(df1['Datetime'])
df2 = pd.to_datetime(df2['Datetime'])
df1.head()
Out [3]     
    Datetime            Value
0   2016-02-02 21:00:00 0.6
1   2016-02-02 22:00:00 0.4
2   2016-02-02 23:00:00 0.4
3   2016-03-02 00:00:00 0.3
4   2016-03-02 01:00:00 0.2
df2.head()
Out [4]     Datetime    No of people
       0    2016-02-02  56
       1    2016-03-02  60
       2    2016-04-02  91
       3    2016-05-02  87
       4    2016-06-02  90

我想拥有的是这样的东西;

    Datetime            Value No of People
0   2016-02-02 21:00:00 0.6   56
1   2016-02-02 22:00:00 0.4   56
2   2016-02-02 23:00:00 0.4   56
3   2016-03-02 00:00:00 0.3   60
4   2016-03-02 01:00:00 0.2   60

任何想法,如何使用Pandas在Python中执行此操作?请注意,可能会有一些日期丢失。

2 个答案:

答案 0 :(得分:1)

您可以为df1.Datetime.dt.date DF设置索引为df1,然后您可以将其加入df2

In [46]: df1.set_index(df1.Datetime.dt.date).join(df2.set_index('Datetime')).reset_index(drop=True)
Out[46]:
             Datetime  Value  No_of_people
0 2016-02-02 21:00:00    0.6            56
1 2016-02-02 22:00:00    0.4            56
2 2016-02-02 23:00:00    0.4            56
3 2016-03-02 00:00:00    0.3            60
4 2016-03-02 01:00:00    0.2            60

您可以选择在调用how='left'函数

时使用join()参数

答案 1 :(得分:0)

您可以使用pd.concatfillna(method='ffill'),因为date值与任何一天的第一个值匹配:

df1 = pd.DataFrame(data={'day': np.random.randint(low=50, high=100, size=10), 'date':pd.date_range(date(2016,1,1), freq='D', periods=10)})

        date  day
0 2016-01-01   55
1 2016-01-02   51
2 2016-01-03   92
3 2016-01-04   78
4 2016-01-05   72

df2 = pd.DataFrame(data={'hour': np.random.randint(low=1, high=10, size=100), 'datetime': pd.date_range(date(2016,1,1), freq='H', periods=100)})

             datetime  hour
0 2016-01-01 00:00:00     5
1 2016-01-01 01:00:00     1
2 2016-01-01 02:00:00     4
3 2016-01-01 03:00:00     5
4 2016-01-01 04:00:00     2
像这样:

pd.concat([df2.set_index('datetime'), df1.set_index('date')], axis=1).fillna(method='ffill')

得到:

                     hour   day
2016-01-01 00:00:00   5.0  55.0
2016-01-01 01:00:00   1.0  55.0
2016-01-01 02:00:00   4.0  55.0
2016-01-01 03:00:00   5.0  55.0
2016-01-01 04:00:00   2.0  55.0
2016-01-01 05:00:00   3.0  55.0
2016-01-01 06:00:00   5.0  55.0
2016-01-01 07:00:00   6.0  55.0
2016-01-01 08:00:00   6.0  55.0
2016-01-01 09:00:00   8.0  55.0
2016-01-01 10:00:00   3.0  55.0
2016-01-01 11:00:00   5.0  55.0
2016-01-01 12:00:00   7.0  55.0
2016-01-01 13:00:00   7.0  55.0
2016-01-01 14:00:00   4.0  55.0
2016-01-01 15:00:00   5.0  55.0
2016-01-01 16:00:00   7.0  55.0
2016-01-01 17:00:00   4.0  55.0
2016-01-01 18:00:00   6.0  55.0
2016-01-01 19:00:00   1.0  55.0
2016-01-01 20:00:00   8.0  55.0
2016-01-01 21:00:00   8.0  55.0
2016-01-01 22:00:00   2.0  55.0
2016-01-01 23:00:00   3.0  55.0
2016-01-02 00:00:00   7.0  51.0
2016-01-02 01:00:00   6.0  51.0
2016-01-02 02:00:00   8.0  51.0
2016-01-02 03:00:00   6.0  51.0
2016-01-02 04:00:00   1.0  51.0
2016-01-02 05:00:00   5.0  51.0
...                   ...   ...
2016-01-04 03:00:00   6.0  78.0
2016-01-04 04:00:00   9.0  78.0
2016-01-04 05:00:00   1.0  78.0
2016-01-04 06:00:00   6.0  78.0
2016-01-04 07:00:00   3.0  78.0
2016-01-04 08:00:00   9.0  78.0
2016-01-04 09:00:00   5.0  78.0
2016-01-04 10:00:00   3.0  78.0
2016-01-04 11:00:00   6.0  78.0
2016-01-04 12:00:00   4.0  78.0
2016-01-04 13:00:00   2.0  78.0
2016-01-04 14:00:00   4.0  78.0
2016-01-04 15:00:00   3.0  78.0
2016-01-04 16:00:00   4.0  78.0
2016-01-04 17:00:00   9.0  78.0
2016-01-04 18:00:00   8.0  78.0
2016-01-04 19:00:00   4.0  78.0
2016-01-04 20:00:00   7.0  78.0
2016-01-04 21:00:00   1.0  78.0
2016-01-04 22:00:00   6.0  78.0
2016-01-04 23:00:00   1.0  78.0
2016-01-05 00:00:00   5.0  72.0
2016-01-05 01:00:00   8.0  72.0
2016-01-05 02:00:00   6.0  72.0
2016-01-05 03:00:00   3.0  72.0
2016-01-06 00:00:00   3.0  87.0
2016-01-07 00:00:00   3.0  50.0
2016-01-08 00:00:00   3.0  65.0
2016-01-09 00:00:00   3.0  81.0
2016-01-10 00:00:00   3.0  65.0