在这种情况下如何使用groupby()?

时间:2019-12-16 16:20:26

标签: python pandas pandas-groupby

比方说:有一个数据框:

country       edition  sports       Athletes               Medals
Germany          1990    Aquatics  HAJOS, Alfred           silver
Germany          1990    Aquatics  HIRSCHMANN, Otto        silver
Germany          1990    Aquatics  DRIVAS, Dimitrios       silver
US               2008    Athletics MALOKINIS, Ioannis      silver
US               2008    Athletics HAJOS, Alfred           silver
US               2009    Athletics CHASAPIS, Spiridon      gold
France           2010    Athletics CHOROPHAS, Efstathios   gold
France           2010    golf      HAJOS, Alfred           silver
France           2011    golf      ANDREOU, Joannis        silver

我想找出哪个版本分配的银牌最多? 所以我正试图通过groupby函数以这种方式解决它:

df.groupby('Edition')[df['Medal']=='Silver'].count().idxmax() 

但它给了我

Key error = 'Columns not found: False, True'

谁能告诉我这是什么问题?

4 个答案:

答案 0 :(得分:3)

这是您的熊猫数据框:

select 
    t.*,
    dense_rank() over(order by Brand) rnk
from mytable
order by Brand, Code

现在,您只需过滤银牌,然后再获得import json import boto3 from boto3.dynamodb.conditions import Key def lambda_handler(event, context): dynamodb = boto3.resource("dynamodb") table_name = "your_table_name" table = dynamodb.Table(table_name) customer_id = 1 start_date = "2019-12-10" end_date = "2019-12-16" response = table.query( KeyConditionExpression=Key('CustomerID').eq(customer_id) & Key('PurchaseDate').between(start_date, end_date) ) return response 版(请注意,import pandas as pd data = [ ['Germany', 1990, 'Aquatics', 'HAJOS, Alfred', 'silver'], ['Germany', 1990, 'Aquatics', 'IRSCHMANN, Otto', 'silver'], ['Germany', 1990, 'Aquatics', 'DRIVAS, Dimitrios', 'silver'], ['US', 2008, 'Athletics', 'MALOKINIS, Ioannis', 'silver'], ['US', 2008, 'Athletics', 'HAJOS, Alfred', 'silver'], ['US', 2009, 'Athletics', 'CHASAPIS, Spiridon', 'gold'], ['France', 2010, 'Athletics', 'CHOROPHAS, Efstathios', 'gold'], ['France', 2010, 'golf', 'HAJOS, Alfred', 'silver'], ['France', 2011, 'golf', 'ANDREOU, Joannis', 'silver'] ] df = pd.DataFrame(data, columns = ['country', 'edition', 'sports', 'Athletes', 'Medals']) print(df) country edition sports Athletes Medals 0 Germany 1990 Aquatics HAJOS, Alfred silver 1 Germany 1990 Aquatics IRSCHMANN, Otto silver 2 Germany 1990 Aquatics DRIVAS, Dimitrios silver 3 US 2008 Athletics MALOKINIS, Ioannis silver 4 US 2008 Athletics HAJOS, Alfred silver 5 US 2009 Athletics CHASAPIS, Spiridon gold 6 France 2010 Athletics CHOROPHAS, Efstathios gold 7 France 2010 golf HAJOS, Alfred silver 8 France 2011 golf ANDREOU, Joannis silver 将抛出groupby而不是'Edition')并最终获得计数:< / p>

KeyError

答案 1 :(得分:1)

您可以按两列分组以解决:

df[df['Medals'] == 'silver'].groupby(['edition','Medals'],as_index=True)['Athletes'].count().idxmax()

# Outcome:
(1990, 'silver')

答案 2 :(得分:0)

df [df ['Medal'] =='silver']。groupby('edition')。size()。idxmax()

我尝试了这个,它奏效了!我只是用size()代替了count()

答案 3 :(得分:0)

您应该为每枚勋章计算每个版本:

>>> df = pd.DataFrame({'edition':[1990,1990,1990,2008,2008,2009,2010,2010,2011],'Medals':['silver','silver','silver','silver','silver','gold','gold','silver','silver']})
>>> df['count'] = ''
>>> df['count'] = df.groupby(['edition','Medals']).transform('count')

然后对max()进行过滤:

>>> df = df[df['Medals'].isin(['silver'])]
>>> df
   edition  Medals  count
0     1990  silver      3
1     1990  silver      3
2     1990  silver      3
3     2008  silver      2
4     2008  silver      2
7     2010  silver      1
8     2011  silver      1
>>> df = df[df['count'].isin([df['count'].max()])]
>>> df
   edition  Medals  count
0     1990  silver      3
1     1990  silver      3
2     1990  silver      3

>>> df[df['count'].isin([df['count'].max()])]['Medals'].unique()[0]

'silver'