初始DataFrame
any
期望输出:
关于输出的重要事项是包含列index is_case value_a value_b
03/01/2005 True 0.598081665 0.189099313
04/01/2005 False 0.480809369 0.142255603
05/01/2005 False 0.963128886 0.422756089
06/01/2005 False 0.687675456 0.739599384
07/01/2005 True 0.513017431 0.397303797
08/01/2005 True 0.691884131 0.922642361
09/01/2005 False 0.659555415 0.222993436
10/01/2005 False 0.920539474 0.553573214
11/01/2005 False 0.360990121 0.535021421
12/01/2005 False 0.512528553 0.343931584
13/01/2005 False 0.083391071 0.277004714
14/01/2005 False 0.382696661 0.204780359
15/01/2005 False 0.838666246 0.337101306
16/01/2005 True 0.363920089 0.355211134
17/01/2005 False 0.354853214 0.691884131
18/01/2005 False 0.089324832 0.910276245
19/01/2005 False 0.611991454 0.513667459
20/01/2005 True 0.210785609 0.839849547
和output
。其他列可能在输出中,也可能不在输出中。
index
转化逻辑
获取index is_case value_a value_b output
03/01/2005 True 0.598081665 0.189099313 06/01/2005
07/01/2005 True 0.513017431 0.397303797 08/01/2005
08/01/2005 True 0.691884131 0.922642361 17/01/2005
16/01/2005 True 0.363920089 0.355211134 17/01/2005
20/01/2005 True 0.210785609 0.839849547 NaN
为value_a
的所有行中的is_case
,然后从下一行开始搜索大于或等于True
的{{1}} ,它会在value_b
列中返回符合此条件的第一个value_a
。
答案 0 :(得分:3)
IIUC,如果您的数据框架太大,您可以使用笛卡尔联接和过滤器,然后删除重复项以获得第一个值匹配:
df_is_case = df[df['is_case'] == True]
df_joined = df_is_case.assign(key=1)\
.merge(df.assign(key=1),
on='key',
suffixes=('','_y'))\
.query('index < index_y and value_a <= value_b_y')
df_out = pd.concat([df_joined, df_is_case])\
.drop_duplicates(subset='index')[['index', 'is_case', 'value_a', 'value_b', 'index_y']]\
.rename(columns={'index_y':'output'})
print(df_out)
输出:
index is_case value_a value_b output
3 03/01/2005 True 0.598082 0.189099 06/01/2005
23 07/01/2005 True 0.513017 0.397304 08/01/2005
50 08/01/2005 True 0.691884 0.922642 17/01/2005
68 16/01/2005 True 0.363920 0.355211 17/01/2005
17 20/01/2005 True 0.210786 0.839850 NaN
答案 1 :(得分:2)
将numpy
与idxmax
mm=np.triu(-df['value_a'].values[:,None]+df['value_b'].values,1)
mm[np.tril_indices(mm.shape[0], 0)] = np.nan
temp=pd.DataFrame(mm)
df1=df.loc[df['is_case']]
df1['New']=np.nan
df1['New'].iloc[:4]=df['index'].iloc[(temp>=0).replace(False,np.nan).idxmax(1)[df['is_case']].dropna().astype(int).values].values
df1
Out[1098]:
index is_case value_a value_b New
0 03/01/2005 True 0.598082 0.189099 06/01/2005
4 07/01/2005 True 0.513017 0.397304 08/01/2005
5 08/01/2005 True 0.691884 0.922642 17/01/2005
13 16/01/2005 True 0.363920 0.355211 17/01/2005
17 20/01/2005 True 0.210786 0.839850 NaN
答案 2 :(得分:1)
以df作为您的数据框
df_iscase = df[df.is_case]
def transform(se):
remaining = df.loc[se.name:]
if len(remaining) < 2: # check if not last value
return np.nan
remaining = remaining.iloc[1:] # grab all next value (removing self)
remaining = remaining[remaining.value_b >= se.value_a] # all where b >= a
if len(remaining) < 1: # if not any
return np.nan
return remaining.iloc[0].name # return first value where b >= a
df_iscase.apply(transform, axis=1)