Question

如果我有这样的DataFrame：

 type   value   group
    a      10     one
    b      45     one
    a     224     two
    b     119     two
    a      33   three
    b      44   three

我该如何做到这一点：

 type     one     two   three
    a      10     224      33
    b      45     119      44

我认为它是pivot_table，但这只是给了我一个重新分组的列表。

Answer 1

我认为您需要pivot rename_axis（pandas 0.18.0中的新内容）和reset_index：

print df.pivot(index='type', columns='group', values='value')
        .rename_axis(None, axis=1)
        .reset_index()

  type  one  three  two
0    a   10     33  224
1    b   45     44  119

如果列的排序很重要：

df = df.pivot(index='type', columns='group', values='value').rename_axis(None, axis=1)

print df[['one','two','three']].reset_index()
  type  one  two  three
0    a   10  224     33
1    b   45  119     44

编辑：

在您的真实数据中，您可能会收到错误：

print df.pivot(index='type', columns='group', values='value')
        .rename_axis(None, axis=1)
        .reset_index()

ValueError：索引包含重复的条目，无法重塑

print df
  type  value  group
0    a     10    one
1    a     20    one
2    b     45    one
3    a    224    two
4    b    119    two
5    a     33  three
6    b     44  three

问题在第二行 - 您获得索引值a和列one两个值 - 10和20。在这种情况下，函数pivot_table汇总数据。 Dafault聚合函数为np.mean，但您可以通过参数aggfunc更改它：

print df.pivot_table(index='type', columns='group', values='value', aggfunc=np.mean)
        .rename_axis(None, axis=1)
        .reset_index()

  type  one  three  two
0    a   15     33  224
1    b   45     44  119

print df.pivot_table(index='type', columns='group', values='value', aggfunc='first')
        .rename_axis(None, axis=1)
        .reset_index()

  type  one  three  two
0    a   10     33  224
1    b   45     44  119

print df.pivot_table(index='type', columns='group', values='value', aggfunc=sum)
        .rename_axis(None, axis=1)
        .reset_index()

  type  one  three  two
0    a   30     33  224
1    b   45     44  119

在Pandas中按组分组

1 个答案: