我有一个df
entrydate exitdate ddmax
1 2012-02-15 2012-02-17 -1
2 2012-02-18 2012-02-19 -2
3 2012-02-20 2012-02-21 -3
4 2012-02-22 2012-02-22 -2
5 2012-02-24 2012-02-24 -6
我希望添加一列df['location']=
,其结果是发生ddmax的DATE。此日期介于进入和退出日期之间。
但是要找到这个日期,我需要在另一个系列上进行查找:
s =
2012-02-15 -3
2012-02-16 -1
2012-02-17 -2
2012-02-18 -2
2012-02-19 -1
2012-02-20 -1
2012-02-21 -3
2012-02-22 -2
2012-02-23 -3
2012-02-24 -6
2012-02-25 -9
所以我通过数字进行查找,并取相关日期
我该怎么做?
我尝试了地图功能,并且pd左合并,但无济于事...
预期输出:
entrydate exitdate ddmax location
1 2012-02-15 2012-02-17 -1 2012-02-16
2 2012-02-18 2012-02-19 -2 2012-02-18
3 2012-02-20 2012-02-21 -3 2012-02-21
4 2012-02-22 2012-02-22 -2 2012-02-22
5 2012-02-24 2012-02-24 -6 2012-02-24
答案 0 :(得分:1)
并不是说这很漂亮,但是如果数据量较小(看起来确实如此)会有所帮助
def lookup(x):
is_ = s.loc[(s.d >= x.entrydate) & (s.d <= x.exitdate), ['i', 'd']]
return is_.loc[is_.i == x.ddmax, 'd'].iloc[0]
df['location'] = df.apply(lookup, 1)
输出
entrydate exitdate ddmax location
1 2012-02-15 2012-02-17 -1 2012-02-16
2 2012-02-18 2012-02-19 -2 2012-02-18
3 2012-02-20 2012-02-21 -3 2012-02-21
4 2012-02-22 2012-02-22 -2 2012-02-22
5 2012-02-24 2012-02-24 -6 2012-02-24
上面的代码假定您的s
是一个数据帧,例如
d i
0 2012-02-15 -3
1 2012-02-16 -1
2 2012-02-17 -2
3 2012-02-18 -2
4 2012-02-19 -1
5 2012-02-20 -1
6 2012-02-21 -3
7 2012-02-22 -2
8 2012-02-23 -3
9 2012-02-24 -6
10 2012-02-25 -9
如果您有pd.Series
,例如
d
2012-02-15 -3
2012-02-16 -1
2012-02-17 -2
2012-02-18 -2
2012-02-19 -1
2012-02-20 -1
2012-02-21 -3
2012-02-22 -2
2012-02-23 -3
2012-02-24 -6
2012-02-25 -9
Name: i, dtype: int64
lookup
函数稍微改变为
def lookup(x):
is_ = s.loc[(s.index >= x.entrydate) & (s.index <= x.exitdate)]
return is_.loc[is_ == x.ddmax].iloc[0]