这是我的dataFrame
df = pd.DataFrame([['@1','A',40],['@2','A',60],['@3','A',47],['@4','B',33],['@5','B',69],['@6','B',22],['@7','B',90],['@8
','C',31],['@9','C',78],['@10','C',12],['@11','C',89],['@12','C',88],['@13','C',99]],columns=['id','channel','score'])
id channel score
0 @1 A 40
1 @2 A 60
2 @3 A 47
3 @4 B 33
4 @5 B 69
5 @6 B 22
6 @7 B 90
7 @8 C 31
8 @9 C 78
9 @10 C 12
10 @11 C 89
11 @12 C 88
12 @13 C 99
每个频道都有自己的总数,我设置百分比数字= 80%
我想取int(channel'num * 0.8)nlargest,所以它将是
A channel take int(3*0.8) = 2
B channel take int(4*0.8) = 3
C channel take int(6*0.8) = 4
id channel score
1 @2 A 60
2 @3 A 47
3 @4 B 33
4 @5 B 69
6 @7 B 90
8 @9 C 78
10 @11 C 89
11 @12 C 88
12 @13 C 99
我怎么能这样做,谢谢。
答案 0 :(得分:6)
a = 0.8
df1 = (df.groupby('channel',group_keys=False)
.apply(lambda x: x.nlargest(int(len(x) * a), 'score')))
print (df1)
id channel score
1 @2 A 60
2 @3 A 47
6 @7 B 90
4 @5 B 69
3 @4 B 33
12 @13 C 99
10 @11 C 89
11 @12 C 88
8 @9 C 78
sort_values
+ groupby
+ head
的另一种解决方案:
df1 = (df.sort_values('score', ascending=False)
.groupby('channel',group_keys=False)
.apply(lambda x: x.head(int(len(x) * a)))
.reset_index(drop=True))
print (df1)
id channel score
0 @2 A 60
1 @3 A 47
2 @7 B 90
3 @5 B 69
4 @4 B 33
5 @13 C 99
6 @11 C 89
7 @12 C 88
8 @9 C 78