Question

我有一个熊猫数据框，如下所示：

genres.head()

   Drama   Comedy  Action  Crime   Romance Thriller    Adventure   Horror  Mystery Fantasy ... History Music   War Documentary Sport   Musical Western Film-Noir   News    number_of_genres
tconst                                                                                  
tt0111161   1   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   1
tt0468569   1   0   1   1   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   3
tt1375666   0   0   1   0   0   0   1   0   0   0   ... 0   0   0   0   0   0   0   0   0   3
tt0137523   1   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   1
tt0110912   1   0   0   1   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   2

我希望能够得到一个表，其中行是流派，列是给定电影的标签数，值是计数。换句话说，我想要这样：

number_of_genres    1   2   3   totals
Drama   451 1481    3574    5506
Comedy  333 1108    2248    3689
Action  9   230 1971    2210
Crime   1   284 1687    1972
Romance 1   646 1156    1803
Thriller    22  449 1153    1624
Adventure   1   98  1454    1553
Horror  137 324 765 1226
Mystery 0   108 792 900
Fantasy 1   74  642 717
Sci-Fi  0   129 551 680
Biography   0   95  532 627
Family  0   60  452 512
Animation   0   6   431 437
History 0   32  314 346
Music   1   87  223 311
War 0   90  162 252
Documentary 70  82  78  230
Sport   0   78  142 220
Musical 0   13  131 144
Western 19  44  57  120
Film-Noir   0   11  50  61
News    0   1   2   3
Total   1046    5530    18567   25143

以Python方式获取该表格的最佳方法是什么？我通过以下代码解决了问题，但想知道是否有更好的方法：

genres['number_of_genres'] = genres.sum(axis=1)
pivots = []
for column in genres.columns[0:-1]:
    column = pd.DataFrame(genres[column])
    columns = column.join(genres.number_of_genres)
    pivot = pd.pivot_table(columns, values=columns.columns[0], columns='number_of_genres', aggfunc=np.sum)
    pivots.append(pivot)

pivots_df = pd.concat(pivots)
pivots_df['totals'] = pivots_df.sum(axis=1)
pivots_df.loc['Total'] = pivots_df.sum()

[EDIT]：添加了应该与pd.read_clipboard（）兼容的jupyter输出。如果我可以更好地格式化输出，请告诉我该怎么做。

Answer 1

也许我遗漏了一些东西，但这对您不起作用吗？

agg = df.groupby('number_of_genres').agg('sum').T
agg['totals'] = agg.sum(axis=1)

编辑：通过pivot_table

解决方案

agg = df.pivot_table(columns='number_of_genres', aggfunc='sum')
agg['total'] = agg.sum(axis=1)

旋转一键编码数据帧

1 个答案: