我有两个具有不同行数的数据帧,一个具有每日值,第二个具有每小时值。我想比较它们,如果日期匹配,那么我想在当天添加每小时价值的每日价值。数据框是;
import pandas as pd
df1 = pd.read_csv('C:\Users\ABC.csv')
df2 = pd.read_csv('C:\Users\DEF.csv')
df1 = pd.to_datetime(df1['Datetime'])
df2 = pd.to_datetime(df2['Datetime'])
df1.head()
Out [3]
Datetime Value
0 2016-02-02 21:00:00 0.6
1 2016-02-02 22:00:00 0.4
2 2016-02-02 23:00:00 0.4
3 2016-03-02 00:00:00 0.3
4 2016-03-02 01:00:00 0.2
df2.head()
Out [4] Datetime No of people
0 2016-02-02 56
1 2016-03-02 60
2 2016-04-02 91
3 2016-05-02 87
4 2016-06-02 90
我想拥有的是这样的东西;
Datetime Value No of People
0 2016-02-02 21:00:00 0.6 56
1 2016-02-02 22:00:00 0.4 56
2 2016-02-02 23:00:00 0.4 56
3 2016-03-02 00:00:00 0.3 60
4 2016-03-02 01:00:00 0.2 60
任何想法,如何使用Pandas在Python中执行此操作?请注意,可能会有一些日期丢失。
答案 0 :(得分:1)
您可以为df1.Datetime.dt.date
DF设置索引为df1
,然后您可以将其加入df2
:
In [46]: df1.set_index(df1.Datetime.dt.date).join(df2.set_index('Datetime')).reset_index(drop=True)
Out[46]:
Datetime Value No_of_people
0 2016-02-02 21:00:00 0.6 56
1 2016-02-02 22:00:00 0.4 56
2 2016-02-02 23:00:00 0.4 56
3 2016-03-02 00:00:00 0.3 60
4 2016-03-02 01:00:00 0.2 60
您可以选择在调用how='left'
函数
join()
参数
答案 1 :(得分:0)
您可以使用pd.concat
和fillna(method='ffill')
,因为date
值与任何一天的第一个值匹配:
df1 = pd.DataFrame(data={'day': np.random.randint(low=50, high=100, size=10), 'date':pd.date_range(date(2016,1,1), freq='D', periods=10)})
date day
0 2016-01-01 55
1 2016-01-02 51
2 2016-01-03 92
3 2016-01-04 78
4 2016-01-05 72
df2 = pd.DataFrame(data={'hour': np.random.randint(low=1, high=10, size=100), 'datetime': pd.date_range(date(2016,1,1), freq='H', periods=100)})
datetime hour
0 2016-01-01 00:00:00 5
1 2016-01-01 01:00:00 1
2 2016-01-01 02:00:00 4
3 2016-01-01 03:00:00 5
4 2016-01-01 04:00:00 2
像这样:
pd.concat([df2.set_index('datetime'), df1.set_index('date')], axis=1).fillna(method='ffill')
得到:
hour day
2016-01-01 00:00:00 5.0 55.0
2016-01-01 01:00:00 1.0 55.0
2016-01-01 02:00:00 4.0 55.0
2016-01-01 03:00:00 5.0 55.0
2016-01-01 04:00:00 2.0 55.0
2016-01-01 05:00:00 3.0 55.0
2016-01-01 06:00:00 5.0 55.0
2016-01-01 07:00:00 6.0 55.0
2016-01-01 08:00:00 6.0 55.0
2016-01-01 09:00:00 8.0 55.0
2016-01-01 10:00:00 3.0 55.0
2016-01-01 11:00:00 5.0 55.0
2016-01-01 12:00:00 7.0 55.0
2016-01-01 13:00:00 7.0 55.0
2016-01-01 14:00:00 4.0 55.0
2016-01-01 15:00:00 5.0 55.0
2016-01-01 16:00:00 7.0 55.0
2016-01-01 17:00:00 4.0 55.0
2016-01-01 18:00:00 6.0 55.0
2016-01-01 19:00:00 1.0 55.0
2016-01-01 20:00:00 8.0 55.0
2016-01-01 21:00:00 8.0 55.0
2016-01-01 22:00:00 2.0 55.0
2016-01-01 23:00:00 3.0 55.0
2016-01-02 00:00:00 7.0 51.0
2016-01-02 01:00:00 6.0 51.0
2016-01-02 02:00:00 8.0 51.0
2016-01-02 03:00:00 6.0 51.0
2016-01-02 04:00:00 1.0 51.0
2016-01-02 05:00:00 5.0 51.0
... ... ...
2016-01-04 03:00:00 6.0 78.0
2016-01-04 04:00:00 9.0 78.0
2016-01-04 05:00:00 1.0 78.0
2016-01-04 06:00:00 6.0 78.0
2016-01-04 07:00:00 3.0 78.0
2016-01-04 08:00:00 9.0 78.0
2016-01-04 09:00:00 5.0 78.0
2016-01-04 10:00:00 3.0 78.0
2016-01-04 11:00:00 6.0 78.0
2016-01-04 12:00:00 4.0 78.0
2016-01-04 13:00:00 2.0 78.0
2016-01-04 14:00:00 4.0 78.0
2016-01-04 15:00:00 3.0 78.0
2016-01-04 16:00:00 4.0 78.0
2016-01-04 17:00:00 9.0 78.0
2016-01-04 18:00:00 8.0 78.0
2016-01-04 19:00:00 4.0 78.0
2016-01-04 20:00:00 7.0 78.0
2016-01-04 21:00:00 1.0 78.0
2016-01-04 22:00:00 6.0 78.0
2016-01-04 23:00:00 1.0 78.0
2016-01-05 00:00:00 5.0 72.0
2016-01-05 01:00:00 8.0 72.0
2016-01-05 02:00:00 6.0 72.0
2016-01-05 03:00:00 3.0 72.0
2016-01-06 00:00:00 3.0 87.0
2016-01-07 00:00:00 3.0 50.0
2016-01-08 00:00:00 3.0 65.0
2016-01-09 00:00:00 3.0 81.0
2016-01-10 00:00:00 3.0 65.0