A B C
0 2002-01-13 18 120
1 2002-01-13 7 150
2 2002-01-13 11 130
3 2002-01-13 26 140
4 2002-01-14 13 180
5 2002-01-14 25 165
6 2002-01-14 9 150
7 2002-01-14 4 190
我有df
。
我应用此代码:
df2 = df.loc[df['B'].sub(10).abs().groupby(df['A']).idxmin()]
df2
中的哪个结果:
A B C
2 2002-01-13 11 130
6 2002-01-14 9 150
现在我想创建一个新的df3,在df
中选择满足下一个条件的行,每个A
组:
df["C"] = df2["C"] + 20
(如果2002-01-13
组,则为130 + 20 = 150)。df
行中不存在满足df["C"] = df2["C"] + 20
的行,则取第一个较低的值(如果是2002-01-14
组,则为150 + 20 = 170.由于170没有如果存在,选择下一个较低,则表示165)。 df3
输出应为:
A B C
1 2002-01-13 7 150
5 2002-01-14 25 165
答案 0 :(得分:2)
您可以使用merge_asof
pd.merge_asof(df1.sort_values('C'),df2.assign(C=df.C+20).sort_values('C'),on='C',by='A',direction ='forward').dropna().drop_duplicates('A',keep='last')
Out[553]:
A B_x C B_y
3 2002-01-13 7 150 11.0
5 2002-01-14 25 165 9.0
更新
pd.merge_asof(df1.sort_values('C').reset_index(),df2.assign(C=df2.C+20).sort_values('C'),on='C',by='A',direction ='forward').dropna().drop_duplicates('A',keep='last').set_index('index')
Out[606]:
A B_x C B_y
index
1 2002-01-13 7 150 11.0
5 2002-01-14 25 165 9.0
答案 1 :(得分:0)
使用lambda和if语句。用于获取索引然后拉取值。如果+20的匹配不到C + 20以下的最大值。
完整代码复制示例改进:
将pandas导入为pd
# build op data frame
df = pd.DataFrame(columns=['A', 'B', 'C'])
A = [pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'),
pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14')]
B = [18, 7, 11, 26, 13, 25, 9, 4]
C = [120, 150, 130, 140, 180, 165, 150, 190]
df['A'] = A
df['B'] = B
df['C'] = C
print(df)
# build df2
df2 = df.loc[df['B'].sub(10).abs().groupby(df['A']).idxmin()]
print(df2)
# find indices in df that meet op criteria
df_ind = df2.apply(lambda row: ((df.A == row.A) & (df.C == row.C+20)).idxmax() if ((df.A == row.A) & (df.C == row.C+20)).sum() > 0 else (df.C.loc[(df.C < row.C+20) & (df.A == row.A)]).idxmax(), axis=1)
print(df_ind)
2 1
6 5
# Build df3
df3 = df.loc[df_ind.tolist(), :]
print(df3)
结果:
A B C
1 2002-01-13 7 150
5 2002-01-14 25 165