groupby一列,并在另一只熊猫中计算5以上的项目

时间:2018-06-06 14:11:18

标签: python pandas counting

所以我有这样的df:

NAME    TRY SCORE  
Bob   1st   3  
Sue   1st   7  
Tom   1st   3  
Max   1st   8  
Jay   1st   4  
Mel   1st   7  
Bob   2nd   4  
Sue   2nd   2  
Tom   2nd   6  
Max   2nd   4  
Jay   2nd   7  
Mel   2nd   8  
Bob   3rd   3  
Sue   3rd   5  
Tom   3rd   6  
Max   3rd   3  
Jay   3rd   4  
Mel   3rd   6 

我想算一下每个人得分超过5的haw mant次数? 进入一个看起来像这样的新df2:

NAME    COUNT  
Bob     0  
Sue     1  
Tom     2  
Mary    1  
Jay     1  
Mel     3  

我的尝试很多 - 这是最新的

df2 = df.groupby('NAME')[['SCORE'] > 5].count().reset_index(name="count")

5 个答案:

答案 0 :(得分:3)

首先创建布尔掩码然后<?php //begin of singleton require 'AltoRouter.php'; $router = new AltoRouter(); $router->map('GET', '/', function () { require '../app/home/controllers/homecontroller.php'; }); //end of singleton $match = $router->match(); if ($match && is_callable($match['target'])) { call_user_func_array($match['target'], $match['params']); } else { // no route was matched header($_SERVER["SERVER_PROTOCOL"] . ' 404 Not Found'); } ?> aggregate - sum的值是True之类的进程:

1

<强>详细

df2 = (df['SCORE'] > 5).groupby(df['NAME']).sum().astype(int).reset_index(name="count")
print (df2)
  NAME  count
0  Bob      0
1  Jay      1
2  Max      1
3  Mel      3
4  Sue      1
5  Tom      2

答案 1 :(得分:3)

只需使用groupbysum

df.assign(SCORE=df.SCORE.gt(5)).groupby('NAME')['SCORE'].sum().astype(int).reset_index()
Out[524]: 
  NAME  SCORE
0  Bob      0
1  Jay      1
2  Max      1
3  Mel      3
4  Sue      1
5  Tom      2

或者我们将set_indexsum

一起使用
df.set_index('NAME').SCORE.gt(5).sum(level=0).astype(int)

答案 2 :(得分:1)

这样做的一种方法是编写一个自定义的groupby函数,你可以在其中获取每个组的分数,并总结大于5的那些:

df.groupby('NAME')['SCORE'].agg(lambda x: (x > 5).sum())


NAME
Bob    0
Jay    1
Max    1
Mel    3
Sue    1
Tom    2
Name: SCORE, dtype: int64

答案 3 :(得分:0)

如果您想将计数作为字典,可以使用git add

collections.Counter

对于数据框,您可以映射唯一名称的计数:

from collections import Counter

c = Counter(df.loc[df['SCORE'] > 5, 'NAME'])

答案 4 :(得分:0)

首先过滤数据帧,然后使用聚合和重新索引进行groupby以填充缺失值。

df[df['SCORE'] > 5].groupby('NAME')['SCORE'].size()\
                   .reindex(df['NAME'].unique(), fill_value=0)

输出:

NAME
Bob    0
Sue    1
Tom    2
Max    1
Jay    1
Mel    3
Name: SCORE, dtype: int64