返回pandas DataFrame,其中包含> = value的第一个下一个案例的索引

时间:2017-12-06 20:06:13

标签: python pandas dataframe

初始DataFrame

any

期望输出:

关于输出的重要事项是包含列index is_case value_a value_b 03/01/2005 True 0.598081665 0.189099313 04/01/2005 False 0.480809369 0.142255603 05/01/2005 False 0.963128886 0.422756089 06/01/2005 False 0.687675456 0.739599384 07/01/2005 True 0.513017431 0.397303797 08/01/2005 True 0.691884131 0.922642361 09/01/2005 False 0.659555415 0.222993436 10/01/2005 False 0.920539474 0.553573214 11/01/2005 False 0.360990121 0.535021421 12/01/2005 False 0.512528553 0.343931584 13/01/2005 False 0.083391071 0.277004714 14/01/2005 False 0.382696661 0.204780359 15/01/2005 False 0.838666246 0.337101306 16/01/2005 True 0.363920089 0.355211134 17/01/2005 False 0.354853214 0.691884131 18/01/2005 False 0.089324832 0.910276245 19/01/2005 False 0.611991454 0.513667459 20/01/2005 True 0.210785609 0.839849547 output。其他列可能在输出中,也可能不在输出中。

index

转化逻辑

获取index is_case value_a value_b output 03/01/2005 True 0.598081665 0.189099313 06/01/2005 07/01/2005 True 0.513017431 0.397303797 08/01/2005 08/01/2005 True 0.691884131 0.922642361 17/01/2005 16/01/2005 True 0.363920089 0.355211134 17/01/2005 20/01/2005 True 0.210785609 0.839849547 NaN value_a的所有行中的is_case,然后从下一行开始搜索大于或等于True的{​​{1}} ,它会在value_b列中返回符合此条件的第一个value_a

3 个答案:

答案 0 :(得分:3)

IIUC,如果您的数据框架太大,您可以使用笛卡尔联接和过滤器,然后删除重复项以获得第一个值匹配:

df_is_case = df[df['is_case'] == True]
df_joined = df_is_case.assign(key=1)\
                         .merge(df.assign(key=1), 
                                on='key', 
                                suffixes=('','_y'))\
                         .query('index < index_y and value_a <= value_b_y')

df_out = pd.concat([df_joined, df_is_case])\
           .drop_duplicates(subset='index')[['index', 'is_case', 'value_a', 'value_b', 'index_y']]\
           .rename(columns={'index_y':'output'})

print(df_out)

输出:

         index  is_case   value_a   value_b      output
3   03/01/2005     True  0.598082  0.189099  06/01/2005
23  07/01/2005     True  0.513017  0.397304  08/01/2005
50  08/01/2005     True  0.691884  0.922642  17/01/2005
68  16/01/2005     True  0.363920  0.355211  17/01/2005
17  20/01/2005     True  0.210786  0.839850         NaN

答案 1 :(得分:2)

numpyidxmax

一起使用
mm=np.triu(-df['value_a'].values[:,None]+df['value_b'].values,1)
mm[np.tril_indices(mm.shape[0], 0)] = np.nan
temp=pd.DataFrame(mm)

df1=df.loc[df['is_case']]
df1['New']=np.nan
df1['New'].iloc[:4]=df['index'].iloc[(temp>=0).replace(False,np.nan).idxmax(1)[df['is_case']].dropna().astype(int).values].values


df1
Out[1098]: 
         index  is_case   value_a   value_b         New
0   03/01/2005     True  0.598082  0.189099  06/01/2005
4   07/01/2005     True  0.513017  0.397304  08/01/2005
5   08/01/2005     True  0.691884  0.922642  17/01/2005
13  16/01/2005     True  0.363920  0.355211  17/01/2005
17  20/01/2005     True  0.210786  0.839850         NaN

答案 2 :(得分:1)

以df作为您的数据框

df_iscase = df[df.is_case]

def transform(se):
    remaining = df.loc[se.name:]
    if len(remaining) < 2: # check if not last value
        return np.nan
    remaining = remaining.iloc[1:] # grab all next value (removing self)
    remaining = remaining[remaining.value_b >= se.value_a] # all where b >= a
    if len(remaining) < 1: # if not any
        return np.nan
    return remaining.iloc[0].name # return first value where b >= a

df_iscase.apply(transform, axis=1)