我有一个查找表(LUT)DataFrame,结构上类似于以下内容:
00000000003000 = 30.00
00000000000500- = -5.00
对于数以千计的ID。我有数据包含在另一个DataFrame(REF)中,我想有效地折叠到上面的LUT中。在结构上,REF类似于:
ID Date ColOne
AAAA 2010-07-06 ...
AAAA 2011-12-31 ...
AAAA 2013-02-15 ...
AAAA 2015-05-21 ...
AAAB 2008-01-08 ...
AAAB 2010-10-20 ...
AAAB 2014-03-31 ...
...
特别是,我想根据每个ID在REF和LUT中出现日期的值,将REF中的RefVal值放入LUT。例如,生成的LUT可能类似于:
ID Date RefVal
AAAA 2009-01-01 Val1
AAAA 2013-05-21 Val2
AAAB 2009-03-02 Val3
AAAB 2012-09-09 Val4
AAAB 2013-12-31 Val5
...
换句话说,LUT中的ReFVal将是最近针对该ID报告的RefVal。更多解释:
我相信可以使用以下内容定义自定义函数并将其应用于LUT:
ID Date ColOne RefVal
AAAA 2010-07-06 ... Val1
AAAA 2011-12-31 ... Val1
AAAA 2013-02-15 ... Val1
AAAA 2015-05-21 ... Val2
AAAB 2008-01-08 ... NaN
AAAB 2010-10-20 ... Val3
AAAB 2014-03-31 ... Val5
但是我不确定如何编写该函数,因为它必须引用另一个DataFrame并使用我正在分组的ID。有什么想法吗?
答案 0 :(得分:1)
ordered_merge
功能可能就是你所追求的:
df1.sort('Date', ascending=False)
df2.sort('Date', ascending=False)
res= pd.ordered_merge(df1, df2, fill_method='ffill')
结果:
ID Date ColOne RefVal
0 AAAA 2009-01-01 ... Val1
1 AAAA 2010-07-06 ... Val1
2 AAAA 2011-12-31 ... Val1
3 AAAA 2013-02-15 ... Val1
4 AAAA 2013-05-21 ... Val2
5 AAAA 2015-05-21 ... Val2
6 AAAB 2008-01-08 ... Val2
7 AAAB 2009-03-02 ... Val3
8 AAAB 2010-10-20 ... Val3
9 AAAB 2012-09-09 ... Val4
10 AAAB 2013-12-31 ... Val5
11 AAAB 2014-03-31 ... Val5
答案 1 :(得分:1)
以下是建议的答案:
df1 = LUT.set_index(['ID','Date']).copy()
df2 = REF.set_index(['ID','Date']).copy()
merged = pd.concat([df1a, df2a]).sort()
merged = merged.reset_index()
现在应用ffill lambda,如下所示:
merged['RefVal'] = merged.groupby('ID')['RefVal'].transform(lambda x: x.ffill())
LUT['RefVal'] = merged.loc[LUT.index,'RefVal']
有什么想法吗?