我的日期框架中有很多行,但有一些低频值。我需要进行逐列计数,然后在频率小于3时更改值。
DF-输入
Col1 Col2 Col3 Col4
1 apple tomato apple
1 apple potato nan
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 grape tomato banana
1 pear tomato banana
1 lemon tomato burger
DF-输出
Col1 Col2 Col3 Col4
1 apple tomato Other
1 apple Other nan
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 Other tomato banana
1 Other tomato banana
1 Other tomato Other
答案 0 :(得分:5)
您将where
与value_counts
:
df.where(df.apply(lambda x: x.groupby(x).transform('count')>2), 'Other')
输出:
Col2 Col3 Col4
Col1
1 apple tomato Other
1 apple Other banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 Other tomato banana
1 Other tomato banana
1 Other tomato Other
d = df.apply(lambda x: x.groupby(x).transform('count'))
df.where(d.gt(2.0).where(d.notnull()).astype(bool), 'Other')
输出:
Col2 Col3 Col4
Col1
1 apple tomato Other
1 apple Other NaN
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 apple tomato banana
1 Other tomato banana
1 Other tomato banana
1 Other tomato Other