熊猫Groupby的条件计数

时间:2020-06-20 18:05:37

标签: python pandas dataframe pandas-groupby data-science

我有一个IPL数据集,如下所示:

def boys_and_girls(boys_count, girls_count):
    print "In your school there are %d boys." % boys_count
    print "In your school there are %d girs." % girls_count
    print "Total number of students in the school is %d." % (boys_count + girls_count)
    print "That's a lot of students!\n"
print "How many boys on the school?"
boys = raw_input(">")
print "How many girls in the school?"
girls = raw_input(">")
boys_and_girls(boys, girls)

我想根据每支球队赢得比赛的次数以及赢得比赛后他们赢得比赛的次数对数据进行分组。

例如,所需的输出是:

df.head(10):        

                  toss_winner                       winner
0    Royal Challengers Bangalore          Sunrisers Hyderabad
1         Rising Pune Supergiant       Rising Pune Supergiant
2          Kolkata Knight Riders        Kolkata Knight Riders
3                Kings XI Punjab              Kings XI Punjab
4    Royal Challengers Bangalore  Royal Challengers Bangalore
5          Sunrisers Hyderabad          Sunrisers Hyderabad
6               Mumbai Indians               Mumbai Indians
7  Royal Challengers Bangalore              Kings XI Punjab
8       Rising Pune Supergiant             Delhi Daredevils
9               Mumbai Indians               Mumbai Indians

我尝试了groupby和aggregation的变体,但是似乎没有任何作用

2 个答案:

答案 0 :(得分:0)

先尝试melt,然后尝试groupbyunstack

s = pd.melt(df).groupby('value')['variable'].value_counts().unstack('variable')\
                .fillna(0)

print(s)

variable                     toss_winner  winner
value                                           
Delhi Daredevils                     0.0     1.0
Kings XI Punjab                      1.0     2.0
Kolkata Knight Riders                1.0     1.0
Mumbai Indians                       2.0     2.0
Rising Pune Supergiant               2.0     1.0
Royal Challengers Bangalore          3.0     1.0
Sunrisers Hyderabad                  1.0     2.0

答案 1 :(得分:0)

这是了解每个步骤的简单方法:

# number of counts each team win the toss
a = df.groupby("toss_winner").size()

# number of times they win the match after winning the toss
b = df.query("toss_winner == winner").groupby(["toss_winner"]).size()

# output
f = pd.concat([a, b], axis=1).reset_index().rename(columns={0: 'total_toss_win', 1: 'win_on_toss_win'})

print(f)

                   toss_winner  total_toss_win  win_on_toss_win
0              Kings XI Punjab               1                1
1        Kolkata Knight Riders               1                1
2               Mumbai Indians               2                2
3       Rising Pune Supergiant               2                1
4  Royal Challengers Bangalore               3                1
5          Sunrisers Hyderabad               1                1