Question

我想将'percent of number of releases in a year'评估为movieLens数据集中一种类型的流行程度参数。示例数据如下所示：

我可以将索引设置为年份

   df1 = df.set_index('year')

然后，我可以找到每行的总数，然后将各个单元格划分以获得百分比感：

df1= df.set_index('year')
df1['total'] = df1.iloc[:,1:4].sum(axis=1)
df2 = df1.drop('movie',axis=1)
df2 = df2.div(df2['total'], axis= 0) * 100
df2.head()

现在，什么是一年中获得发行数量百分比的最佳方法？我相信先使用“ groupby”，然后使用热图？

Answer 1

您可以清楚地使用groupby方法：

import pandas as pd
import numpy as np

df = pd.DataFrame({'movie':['Movie1','Movie2','Movie3'],  'action':[1,0,0], 'com':[np.nan,np.nan,1], 'drama':[1,1,np.nan], 'year
':[1994,1994,1995]})

df.fillna(0,inplace=True)
df.set_index('year')
print((df.groupby(['year']).sum()/len(df))*100)

输出：

         action        com      drama
year                                 
1994  33.333333   0.000000  66.666667
1995   0.000000  33.333333   0.000000

此外，您可以将pandas内置的style用于数据框的彩色表示（或仅使用seaborn）：

df = df.groupby(['year']).sum()/len(df)*100
df.style.background_gradient(cmap='viridis')

输出：

Python-根据列值获取百分比

1 个答案: