我有一个演员和导演的数据集以及他们一起合作的电影的受欢迎程度。
print (actors_director_df.head(3))
actor director popularity counter
0 Chris Pratt Colin Trevorrow 32.985763 0
1 Bryce Dallas Howard Colin Trevorrow 32.985763 0
2 Irrfan Khan Colin Trevorrow 32.985763 0
我想通过使用演员和导演进行分组,因为一对可以在不止一部电影中工作。我成功地使用了以下查询。
actor_director_grouped = actors_director_df.groupby(['actor','director']) \
.size() \
.reset_index(name='count') \
.sort_values(['count'], ascending=False) \
.head(10)
print (actor_director_grouped)
actor director count
3619 Clint Eastwood Clint Eastwood 14
19272 Woody Allen Woody Allen 12
9606 Johnny Depp Tim Burton 8
但是这个DF中的人气专栏没有找到。
我想要做的就是在groupby之后做一个受欢迎的平均栏,并在演员和导演面前展示他们共同制作的电影数量。
即。我理想的输出就是这样的。
actor director popularity count
3619 Clint Eastwood Clint Eastwood 32.985763 14
19272 Woody Allen Woody Allen 5.1231231 12
9606 Johnny Depp Tim Burton 3.1231231 8
答案 0 :(得分:3)
查看您的数据框,counter
列似乎没必要。我们改为使用热门列,制作一个mean
和一个count
列:
import pandas as pd
import numpy as np
np.random.seed(444)
names = [
'Robert Baratheon',
'Jon Snow',
'Daenerys Targaryen',
'Theon Greyjoy',
'Tyrion Lannister'
]
df = pd.DataFrame({
'actor': np.random.choice(names, size=10, p = [0.2,0.2,0.2,0.1,0.3]),
'director': np.random.choice(names, size=10, p = [0.4,0.1,0.1,0.1,0.3]),
'popularity': np.random.randint(0,100, size=10),
'counter': 0
})
df2 = df.groupby(['actor','director'])['popularity']\
.agg(['count', 'mean'])\
.reset_index()\
.sort_values(by='mean', ascending=False)
print(df2)
返回:
actor director count mean
0 Jon Snow Robert Baratheon 2 53.5
5 Tyrion Lannister Tyrion Lannister 2 49.0
2 Robert Baratheon Tyrion Lannister 2 48.5
1 Robert Baratheon Jon Snow 2 40.5
4 Theon Greyjoy Tyrion Lannister 1 13.0
3 Theon Greyjoy Robert Baratheon 1 7.0
答案 1 :(得分:2)
我冒昧地添加了一些有助于更好地理解groupby
子句的虚拟数据。
print(df)
输出:
actor director popularity counter
0 Chris Pratt Colin Trevorrow 32.985763 0
1 Bryce Dallas Howard Colin Trevorrow 32.985763 0
2 Irrfan Khan Colin Trevorrow 32.985763 0
3 Irrfan Khan Colin Trevorrow 60.000000 12
4 Irrfan Khan John Markson 10.000000 10
5 Irrfan Khan Mark Johnson 100.000000 4
然后,您需要在groupby
和actor
上director
然后找到mean
的{{1}}和popularity
的{{1}}
sum
输出:
count