我需要获得'ma'和'young'观看的top1和top2评分。在这里我只需要专门定义我的值,而不是使用group by的列。
数据:
gender age rating
ma young PG
fe young PG
ma adult PG
fe adult PG
ma young PG
fe young PG
ma adult R
fe adult R
ma young R
fe young R
代码:
top1 = df.groupby(['ma','young']])['rating'].apply(lambda x: x.value_counts().index[0])
top2 = df.groupby(['ma','young']])['rating'].apply(lambda x: x.value_counts().index[1])
请让我知道我该怎么做。
答案 0 :(得分:2)
首先过滤然后获得tops,但一般情况下可能不存在第二个top:
df1 = df.query("gender== 'ma' & age == 'young'")
#alternative is boolean indexing
#df1 = df[(df['gender'] == 'ma') & (df['age'] == 'young')]
tops = df1.groupby(['gender','age'])['rating'].value_counts()
print (tops)
gender age rating
ma young PG 2
R 1
print (df.iloc[[0]])
gender age rating
0 ma young PG
print (df.iloc[[1]])
gender age rating
1 fe young PG