我有一个如下所示的数据框:
bucket type v
0 0 X 14
1 1 X 10
2 1 Y 11
3 1 X 15
4 2 X 16
5 2 Y 9
6 2 Y 10
7 3 Y 20
8 3 X 18
9 3 Y 15
10 3 X 14
所需的输出如下所示:
bucket type v v_paired
0 1 X 14 nan (no Y coming before it)
1 1 X 10 nan (no Y coming before it)
2 1 Y 11 14 (highest X in bucket 1 before this row)
3 1 X 15 11 (lowest Y in bucket 1 before this row)
4 2 X 16 nan (no Y coming before it in the same bucket)
5 2 Y 9 16 (highest X in same bucket coming before)
6 2 Y 10 16 (highest X in same bucket coming before)
7 3 Y 20 nan (no X coming before it in the same bucket)
8 3 X 18 20 (single Y coming before it in same bucket)
9 3 Y 15 18 (single Y coming before it in same bucket)
10 3 X 14 15 (smallest Y coming before it in same bucket)
目标是构建v_paired列,规则如下:
在同一个存储桶中查找具有相反类型(X与Y)的相同存储桶中的行,将这些行称为“对候选者”
如果当前行是X,请选择min。如果当前行为Y,则从对候选对中成为v_paired,选择最大值。 v中的候选对象是当前行的v_paired
提前致谢。
答案 0 :(得分:0)
我相信这应该以顺序的方式完成...... 第一组按桶
groups = df.groupby('bucket', group_keys=False)
此功能将应用于每个存储桶组
def func(group):
y_value = None
x_value = None
result = []
for _, (_, value_type, value) in group.iterrows():
if value_type == 'X':
x_value = max(filter(None,(x_value, value)))
result.append(y_value)
elif value_type == 'Y':
y_value = min(filter(None,(y_value, value)))
result.append(x_value)
return pd.DataFrame(result)
df['v_paired'] = groups.apply(func)
希望这将完成这项工作