我有2个数据帧。一种是这种形式:
df1:
date revenue
0 2016-11-17 385.943800
1 2016-11-18 1074.160340
2 2016-11-19 2980.857860
3 2016-11-20 1919.723960
4 2016-11-21 884.279340
5 2016-11-22 869.071070
6 2016-11-23 760.289260
7 2016-11-24 2481.689270
8 2016-11-25 2745.990070
9 2016-11-26 2273.413250
10 2016-11-27 2630.414900
另一个是这种形式:
df2:
CET MaxTemp MeanTemp MinTemp MaxHumidity MeanHumidity MinHumidity
0 2016-11-17 11 9 7 100 85 63
1 2016-11-18 9 6 3 93 83 66
2 2016-11-19 8 6 4 93 87 76
3 2016-11-20 10 7 4 93 84 81
4 2016-11-21 14 10 7 100 89 77
5 2016-11-22 13 10 7 93 79 63
6 2016-11-23 11 8 5 100 91 82
7 2016-11-24 9 7 4 93 80 66
8 2016-11-25 7 4 1 87 74 57
9 2016-11-26 7 3 -1 100 88 61
10 2016-11-27 10 7 4 100 81 66
两个数据帧都有更多行,行数每天都在增加。
我希望以某种方式合并这两个数据框,每当我们在df1['date']
和df2['CET']
中看到相同的日期时,我们会向df2添加一个额外的列,这将具有收入值这个日期。所以我想创建这个:
df2:
CET MaxTemp MeanTemp MinTemp MaxHumidity MeanHumidity MinHumidity revenue
0 2016-11-17 11 9 7 100 85 63 385.943800
1 2016-11-18 9 6 3 93 83 66 1074.160340
2 2016-11-19 8 6 4 93 87 76 2980.857860
3 2016-11-20 10 7 4 93 84 81 1919.723960
4 2016-11-21 14 10 7 100 89 77 884.279340
5 2016-11-22 13 10 7 93 79 63 869.071070
6 2016-11-23 11 8 5 100 91 82 760.289260
7 2016-11-24 9 7 4 93 80 66 2481.689270
8 2016-11-25 7 4 1 87 74 57 2745.990070
9 2016-11-26 7 3 -1 100 88 61 2273.413250
10 2016-11-27 10 7 4 100 81 66 2630.414900
有人可以帮我怎么做吗?
答案 0 :(得分:3)
我认为您可以使用map
:
df2['revenue'] = df2.CET.map(df1.set_index('date')['revenue'])
此外,您可以将Series
转换为dict
,然后大df
的速度会快一些:
df2['revenue'] = df2.CET.map(df1.set_index('date')['revenue'].to_dict())
print (df2)
CET MaxTemp MeanTemp MinTemp MaxHumidity MeanHumidity \
0 2016-11-17 11 9 7 100 85
1 2016-11-18 9 6 3 93 83
2 2016-11-19 8 6 4 93 87
3 2016-11-20 10 7 4 93 84
4 2016-11-21 14 10 7 100 89
5 2016-11-22 13 10 7 93 79
6 2016-11-23 11 8 5 100 91
7 2016-11-24 9 7 4 93 80
8 2016-11-25 7 4 1 87 74
9 2016-11-26 7 3 -1 100 88
10 2016-11-27 10 7 4 100 81
MinHumidity revenue
0 63 385.94380
1 66 1074.16034
2 76 2980.85786
3 81 1919.72396
4 77 884.27934
5 63 869.07107
6 82 760.28926
7 66 2481.68927
8 57 2745.99007
9 61 2273.41325
10 66 2630.41490
如果所有输出值均为NAN
,则问题与dtypes
列CET
和date
不同{/ 1>}:
print (df1.date.dtypes)
object
print (df2.CET.dtype)
datetime64[ns]
解决方案是转换string
列to_datetime
:
df1.date = pd.to_datetime(df1.date)
答案 1 :(得分:3)
.map()
和date
列中的值非常相同时, CET
解决方案才有效。
如果您的值略有不同,则可以使用pd.merge_asof()方法:
In [17]: pd.merge_asof(df1, df2, left_on='date', right_on='CET', tolerance=pd.Timedelta('2 hours'))
Out[17]:
date revenue CET MaxTemp MeanTemp MinTemp MaxHumidity MeanHumidity MinHumidity
0 2016-11-17 385.94380 2016-11-17 11 9 7 100 85 63
1 2016-11-18 1074.16034 2016-11-18 9 6 3 93 83 66
2 2016-11-19 2980.85786 2016-11-19 8 6 4 93 87 76
3 2016-11-20 1919.72396 2016-11-20 10 7 4 93 84 81
4 2016-11-21 884.27934 2016-11-21 14 10 7 100 89 77
5 2016-11-22 869.07107 2016-11-22 13 10 7 93 79 63
6 2016-11-23 760.28926 2016-11-23 11 8 5 100 91 82
7 2016-11-24 2481.68927 2016-11-24 9 7 4 93 80 66
8 2016-11-25 2745.99007 2016-11-25 7 4 1 87 74 57
9 2016-11-26 2273.41325 2016-11-26 7 3 -1 100 88 61
10 2016-11-27 2630.41490 2016-11-27 10 7 4 100 81 66
注意: merge_asof()
功能已添加到Pandas 0.19.0中(即旧版本不提供)
<强>演示:强>
In [191]: df2
Out[191]:
CET MaxTemp MeanTemp MinTemp MaxHumidity MeanHumidity MinHumidity
0 2016-11-17 01:39:00 11 9 7 100 85 63
1 2016-11-18 01:39:00 9 6 3 93 83 66
2 2016-11-19 01:39:00 8 6 4 93 87 76
3 2016-11-20 01:39:00 10 7 4 93 84 81
4 2016-11-21 01:39:00 14 10 7 100 89 77
5 2016-11-22 01:39:00 13 10 7 93 79 63
6 2016-11-23 01:39:00 11 8 5 100 91 82
7 2016-11-24 01:39:00 9 7 4 93 80 66
8 2016-11-25 01:39:00 7 4 1 87 74 57
9 2016-11-26 01:39:00 7 3 -1 100 88 61
10 2016-11-27 01:39:00 10 7 4 100 81 66
In [192]: df1
Out[192]:
date revenue
0 2016-11-17 385.94380
1 2016-11-18 1074.16034
2 2016-11-19 2980.85786
3 2016-11-20 1919.72396
4 2016-11-21 884.27934
5 2016-11-22 869.07107
6 2016-11-23 760.28926
7 2016-11-24 2481.68927
8 2016-11-25 2745.99007
9 2016-11-26 2273.41325
10 2016-11-27 2630.41490
In [193]: pd.merge_asof(df2, df1, left_on='CET', right_on='date')
Out[193]:
CET MaxTemp MeanTemp MinTemp MaxHumidity MeanHumidity MinHumidity date revenue
0 2016-11-17 01:39:00 11 9 7 100 85 63 2016-11-17 385.94380
1 2016-11-18 01:39:00 9 6 3 93 83 66 2016-11-18 1074.16034
2 2016-11-19 01:39:00 8 6 4 93 87 76 2016-11-19 2980.85786
3 2016-11-20 01:39:00 10 7 4 93 84 81 2016-11-20 1919.72396
4 2016-11-21 01:39:00 14 10 7 100 89 77 2016-11-21 884.27934
5 2016-11-22 01:39:00 13 10 7 93 79 63 2016-11-22 869.07107
6 2016-11-23 01:39:00 11 8 5 100 91 82 2016-11-23 760.28926
7 2016-11-24 01:39:00 9 7 4 93 80 66 2016-11-24 2481.68927
8 2016-11-25 01:39:00 7 4 1 87 74 57 2016-11-25 2745.99007
9 2016-11-26 01:39:00 7 3 -1 100 88 61 2016-11-26 2273.41325
10 2016-11-27 01:39:00 10 7 4 100 81 66 2016-11-27 2630.41490
使用.map()
方法:
In [194]: df2.CET.map(df1.set_index('date')['revenue'])
Out[194]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
Name: CET, dtype: float64