比方说:有一个数据框:
country edition sports Athletes Medals
Germany 1990 Aquatics HAJOS, Alfred silver
Germany 1990 Aquatics HIRSCHMANN, Otto silver
Germany 1990 Aquatics DRIVAS, Dimitrios silver
US 2008 Athletics MALOKINIS, Ioannis silver
US 2008 Athletics HAJOS, Alfred silver
US 2009 Athletics CHASAPIS, Spiridon gold
France 2010 Athletics CHOROPHAS, Efstathios gold
France 2010 golf HAJOS, Alfred silver
France 2011 golf ANDREOU, Joannis silver
我想找出哪个版本分配的银牌最多? 所以我正试图通过groupby函数以这种方式解决它:
df.groupby('Edition')[df['Medal']=='Silver'].count().idxmax()
但它给了我
Key error = 'Columns not found: False, True'
谁能告诉我这是什么问题?
答案 0 :(得分:3)
这是您的熊猫数据框:
select
t.*,
dense_rank() over(order by Brand) rnk
from mytable
order by Brand, Code
现在,您只需过滤银牌,然后再获得import json
import boto3
from boto3.dynamodb.conditions import Key
def lambda_handler(event, context):
dynamodb = boto3.resource("dynamodb")
table_name = "your_table_name"
table = dynamodb.Table(table_name)
customer_id = 1
start_date = "2019-12-10"
end_date = "2019-12-16"
response = table.query(
KeyConditionExpression=Key('CustomerID').eq(customer_id) &
Key('PurchaseDate').between(start_date, end_date)
)
return response
版(请注意,import pandas as pd
data = [
['Germany', 1990, 'Aquatics', 'HAJOS, Alfred', 'silver'],
['Germany', 1990, 'Aquatics', 'IRSCHMANN, Otto', 'silver'],
['Germany', 1990, 'Aquatics', 'DRIVAS, Dimitrios', 'silver'],
['US', 2008, 'Athletics', 'MALOKINIS, Ioannis', 'silver'],
['US', 2008, 'Athletics', 'HAJOS, Alfred', 'silver'],
['US', 2009, 'Athletics', 'CHASAPIS, Spiridon', 'gold'],
['France', 2010, 'Athletics', 'CHOROPHAS, Efstathios', 'gold'],
['France', 2010, 'golf', 'HAJOS, Alfred', 'silver'],
['France', 2011, 'golf', 'ANDREOU, Joannis', 'silver']
]
df = pd.DataFrame(data, columns = ['country', 'edition', 'sports', 'Athletes', 'Medals'])
print(df)
country edition sports Athletes Medals
0 Germany 1990 Aquatics HAJOS, Alfred silver
1 Germany 1990 Aquatics IRSCHMANN, Otto silver
2 Germany 1990 Aquatics DRIVAS, Dimitrios silver
3 US 2008 Athletics MALOKINIS, Ioannis silver
4 US 2008 Athletics HAJOS, Alfred silver
5 US 2009 Athletics CHASAPIS, Spiridon gold
6 France 2010 Athletics CHOROPHAS, Efstathios gold
7 France 2010 golf HAJOS, Alfred silver
8 France 2011 golf ANDREOU, Joannis silver
将抛出groupby
而不是'Edition'
)并最终获得计数:< / p>
KeyError
答案 1 :(得分:1)
您可以按两列分组以解决:
df[df['Medals'] == 'silver'].groupby(['edition','Medals'],as_index=True)['Athletes'].count().idxmax()
# Outcome:
(1990, 'silver')
答案 2 :(得分:0)
df [df ['Medal'] =='silver']。groupby('edition')。size()。idxmax()
我尝试了这个,它奏效了!我只是用size()代替了count()
答案 3 :(得分:0)
您应该为每枚勋章计算每个版本:
>>> df = pd.DataFrame({'edition':[1990,1990,1990,2008,2008,2009,2010,2010,2011],'Medals':['silver','silver','silver','silver','silver','gold','gold','silver','silver']})
>>> df['count'] = ''
>>> df['count'] = df.groupby(['edition','Medals']).transform('count')
然后对max()进行过滤:
>>> df = df[df['Medals'].isin(['silver'])]
>>> df
edition Medals count
0 1990 silver 3
1 1990 silver 3
2 1990 silver 3
3 2008 silver 2
4 2008 silver 2
7 2010 silver 1
8 2011 silver 1
>>> df = df[df['count'].isin([df['count'].max()])]
>>> df
edition Medals count
0 1990 silver 3
1 1990 silver 3
2 1990 silver 3
或
>>> df[df['count'].isin([df['count'].max()])]['Medals'].unique()[0]
'silver'