String = String.replace(' | Idriss Aberkane | TEDxPanthéonSorbonne', '')
我想绘制每个&x 39的平均值。因为有多种类型[由于单个电影有多种类型,我将它们分成单个类型],但我不确定我的代码有什么问题。它没做我想做的事。我需要一些帮助。
答案 0 :(得分:1)
我认为如果需要聚合只使用一个更常见的函数groupby
+ mean
:
import numpy as np
df = pd.DataFrame({'genres':['Comedy|Crime|Drama|Thriller','Comedy|Crime|Drama',
'Comedy|Crime','Drama|Thriller','Drama','Comedy|Crime'],
'gross':[10,20,30,40,50,60],
'budget':[3,4,5,3,2,5]})
df = df.dropna(subset=['genres']).reset_index(drop=True)
splitted = df['genres'].str.split('|')
l = splitted.str.len()
x = df['gross'] / df['budget']
#is necessary define new column name (divided) and change `df[]` to `x`
df = pd.DataFrame({'divided': np.repeat(x, l), 'genres':np.concatenate(splitted)})
print (df)
divided genres
0 3.333333 Comedy
1 3.333333 Crime
2 3.333333 Drama
3 3.333333 Thriller
4 5.000000 Comedy
5 5.000000 Crime
6 5.000000 Drama
7 6.000000 Comedy
8 6.000000 Crime
9 13.333333 Drama
10 13.333333 Thriller
11 25.000000 Drama
12 12.000000 Comedy
13 12.000000 Crime
#define column for aggregate (divided), no x, because processing new df created by repeat
d = {'mean':'Average Income'}
df1 = df.groupby('genres')['divided'].mean().rename(columns=d).reset_index(name='return')
df1.plot.bar(x='genres', y='return')
plt.yscale("log")
plt.xlabel("Genre")