按各种类型绘制平均值;大熊猫

时间:2017-06-25 04:10:28

标签: pandas matplotlib

String = String.replace(' | Idriss Aberkane | TEDxPanthéonSorbonne', '')

我想绘制每个&x 39的平均值。因为有多种类型[由于单个电影有多种类型,我将它们分成单个类型],但我不确定我的代码有什么问题。它没做我想做的事。我需要一些帮助。

Here's the error message

Jezrael's output

1 个答案:

答案 0 :(得分:1)

我认为如果需要聚合只使用一个更常见的函数groupby + mean

import numpy as np

df = pd.DataFrame({'genres':['Comedy|Crime|Drama|Thriller','Comedy|Crime|Drama',
                   'Comedy|Crime','Drama|Thriller','Drama','Comedy|Crime'],
                   'gross':[10,20,30,40,50,60],
                   'budget':[3,4,5,3,2,5]})


df = df.dropna(subset=['genres']).reset_index(drop=True) 

splitted = df['genres'].str.split('|')  
l = splitted.str.len()

x = df['gross'] / df['budget']

#is necessary define new column name (divided) and change `df[]` to `x`  
df = pd.DataFrame({'divided': np.repeat(x, l), 'genres':np.concatenate(splitted)}) 
print (df)
      divided    genres
0    3.333333    Comedy
1    3.333333     Crime
2    3.333333     Drama
3    3.333333  Thriller
4    5.000000    Comedy
5    5.000000     Crime
6    5.000000     Drama
7    6.000000    Comedy
8    6.000000     Crime
9   13.333333     Drama
10  13.333333  Thriller
11  25.000000     Drama
12  12.000000    Comedy
13  12.000000     Crime
#define column for aggregate (divided), no x, because processing new df created by repeat 
d = {'mean':'Average Income'}
df1 = df.groupby('genres')['divided'].mean().rename(columns=d).reset_index(name='return')


df1.plot.bar(x='genres', y='return') 

plt.yscale("log") 
plt.xlabel("Genre") 

graph