好的,所以希望标题是可以理解的。我有两个数据帧,一个带有日期时间索引,一个带有值的列,另一个带有经度和纬度以及其他列。
一般布局是
df1=
factor
2015-04-15 NaN
2015-04-16 NaN
2015-04-17 NaN
2015-04-18 NaN
2015-04-19 NaN
2015-04-20 NaN
2015-04-21 NaN
2015-04-22 NaN
2015-04-23 NaN
2015-04-24 7.067218
2015-04-25 9.414628
2015-04-26 13.702154
2015-04-27 16.489926
2015-04-28 17.917428
2015-04-29 20.359118
2015-04-30 18.608707
2015-05-01 10.627798
2015-05-02 8.398942
2015-05-03 5.984976
2015-05-04 4.363621
2015-05-05 3.468062
2015-05-06 2.830794
2015-05-07 2.347879
df2=
i_lat i_lon multiplier sum ID distance
226 1092 264 -60.420166 61.420166 609 0.6142016587060164 km
228 1092 265 -129.914662 130.914662 609 1.309146617117938 km
204 1091 264 -203.371915 204.371915 609 2.043719152272311 km
206 1091 265 -233.799786 234.799786 609 2.347997860007727 km
224 1092 263 -240.718140 241.718140 609 2.417181399246371 km
.. ... ... ... ... ... ...
295 1095 268 -969.728516 970.728516 609 9.707285164114008 km
216 1092 259 -977.398084 978.398084 609 9.783980837220454 km
278 1094 269 -984.131470 985.131470 609 9.851314704203592 km
160 1088 267 -994.142285 995.142285 609 9.951422853836982 km
194 1091 259 -996.513606 997.513606 609 9.975136064824323 km
我基本上需要为每对df1["factor"]*df2["multiplier"]+df2["sum"]
做一次i_lat and i_lon
,以便输出这样的多索引数据帧
df_output=
col
i_lat i_lon time
1092 264 2015-04-15 -9.000000e+33
2015-04-16 -9.000000e+33
2015-04-17 -9.000000e+33
2015-04-18 -9.000000e+33
2015-04-19 -9.000000e+33
... ...
1091 259 2015-05-05 -9.000000e+33
2015-05-06 -9.000000e+33
2015-05-07 -9.000000e+33
2015-05-08 -9.000000e+33
2015-05-09 -9.000000e+33
col
具有上述操作。我尝试将apply
用作df2.apply(lambda a: print(df1*a["multiplier"]+a["sum"], axis=1))
,但返回的内容没有意义。不太了解如何从现在开始继续。
谢谢!
答案 0 :(得分:1)
IIUC,您可以这样做:
df2=df2.set_index(['i_lat', 'i_lon'])
(pd.DataFrame(df1.values * df2.multiplier.values + df2['sum'].values,
index=df1.index,
columns=df2.index
)
.unstack()
)