我需要根据条件选择熊猫组中的行。
条件1#对于给定的组R1,R2,W,如果TYPE(A)的数量2等于TYPE(B)的行,则需要带出完整的TYPE(A)的行作为输出。
条件2#对于给定的组R1,R2,W,如果TYPE(A)行数量2不等于TYPE(B)行数量2,我们需要对两个TYPE(A)的数量1和数量2求和&(B)行&我们需要将TYPE(A)行中的其余列作为输出。
输入数据框
R1 R2 W TYPE amount1 amount2 Status Exchange
0 123 12 1 A 111 222 C 1.5
1 123 12 1 B 111 222 D 2.5
2 123 12 2 A 222 222 A 1.5
3 123 12 2 B 333 333 D 2.5
4 123 12 3 A 444 444 D 2.5
5 123 12 3 B 333 333 E 3.5
预期产量
R1 R2 W TYPE amount1 amount2 Status Exchange
0 123 12 1 A 111 222 C 1.5
1 123 12 2 A 555 555 A 1.5
2 123 12 3 A 777 777 D 2.5
答案 0 :(得分:0)
首先需要通过用DataFrame.set_index
和DataFrame.unstack
进行整形,使amount1
等于amount2
的所有组,将DataFrame.xs
和{{3 }},并测试是否使用了所有匹配的列DataFrame.eq
,最后一次使用DataFrame.all
保留与原始长度相同的长度:
df1 = df.set_index(['R1','R2','W','TYPE'])['amount2'].unstack()
m = df1['A'].eq(df1['B']).rename('m')
m = df.join(m, on=['R1','R2','W'])['m']
然后,对于匹配行(此处为第一组),按DataFrame.merge
过滤,仅A
行按&
链接,按位AND
:
df2 = df[m & df['TYPE'].eq('A')]
print (df2)
R1 R2 W TYPE amount1 amount2 Status Exchange
0 123 12 1 A 111 222 C 1.5
然后用~
的反向掩码过滤所有其他组,并按boolean indexing
的amount
的所有列汇总sum
的所有列:
cols = df.columns.difference(['R1','R2','W','amount1','amount2'])
d1 = dict.fromkeys(['amount1','amount2'], 'sum')
d2 = dict.fromkeys(cols, 'first')
df3 = df[~m].groupby(['R1','R2','W'], as_index=False).agg({**d1, **d2}).assign(TYPE='A')
print (df3)
R1 R2 W amount1 amount2 Exchange Status TYPE
0 123 12 2 555 555 1.5 A A
1 123 12 3 777 777 2.5 D A
最后通过GroupBy.agg
连接在一起,并在必要时通过GroupBy.first
进行排序:
df4 = pd.concat([df2, df3], ignore_index=True, sort=False).sort_values(['R1','R2','W'])
print (df4)
R1 R2 W TYPE amount1 amount2 Status Exchange
0 123 12 1 A 111 222 C 1.5
1 123 12 2 A 555 555 A 1.5
2 123 12 3 A 777 777 D 2.5
答案 1 :(得分:0)
另一种解决方案:
#get the rows for A for each grouping
#assumption is TYPE is already sorted with A always ahead of B
core = ['R1','R2','W']
A = df.groupby(core).first()
#get rows for B for each grouping
B = df.groupby(core).last()
#first condition
cond1 = (A.amount1.eq(B.amount1)) & (A.amount2.eq(B.amount2))
#extract outcome from A to get the first part
part1 = A.loc[cond1]
#second condition
cond2 = A.amount2.ne(B.amount2)
#add the 'amount1' and 'amount 2' columns based on the second condition
part2 = B.loc[cond2].filter(['amount1','amount2']) +
A.loc[cond2].filter(['amount1','amount2'])
#merge with A to get the remaining columns
part2 = part2.join(A[['TYPE','Status','Exchange']])
#merge part1 and 2 to get final result
pd.concat([part1,part2]).reset_index()
R1 R2 W TYPE amount1 amount2 Status Exchange
0 123 12 1 A 111 222 C 1.5
1 123 12 2 A 555 555 A 1.5
2 123 12 3 A 777 777 D 2.5