pandas - 根据行的状态转换列中的多行

时间:2018-01-11 13:06:32

标签: pandas data-manipulation

如何转换以下数据框的最佳方法还是添加“状态”的总和?

在:

plan type  hour status total
A    cont   0    ok      10
A    cont   0    notok    3
A    cont   0    other    1
A    vend   1    ok       7
A    vend   1    notok    2
A    vend   1    other    0
B    test   5    ok      20
B    test   5    notok    6
B    test   5    other   13

后:

plan type  hour  ok   notok other sum
A    cont   0    10   3      1    14
A    vend   1     7   2      0     9
B    test   5    20   6     13    39 

提前致谢!

2 个答案:

答案 0 :(得分:0)

你可以

In [9]: dff = df.pivot_table(index=['plan', 'type', 'hour'], columns='status', 
                             values='total')

In [10]: dff['sum'] = dff.sum(axis=1)

In [11]: dff.reset_index()
Out[11]:
status plan  type  hour  notok  ok  other  sum
0         A  cont     0      3  10      1   14
1         A  vend     1      2   7      0    9
2         B  test     5      6  20     13   39

答案 1 :(得分:0)

使用set_index + unstack进行重新整形,按assign添加新列,使用reset_index添加rename_axis

df = (df.set_index(['plan', 'type', 'hour', 'status'])['total']
        .unstack()
        .assign(sum=lambda x: x.sum(1))
        .reset_index()
        .rename_axis(None, 1))
print (df)
  plan  type  hour  notok  ok  other  sum
0    A  cont     0      3  10      1   14
1    A  vend     1      2   7      0    9
2    B  test     5      6  20     13   39

如果不是由plan, type, hour定义的唯一三元组,则使用groupbymean等集合函数或其他答案:

print (df)
  plan  type  hour status  total
0    A  cont     0     ok     10 <- duplicate 10 for plan, type, hour
1    A  cont     0     ok    100 <- duplicate 100 for plan, type, hour
2    A  cont     0  notok      3
3    A  cont     0  other      1
4    A  vend     1     ok      7
5    A  vend     1  notok      2
6    A  vend     1  other      0
7    B  test     5     ok     20
8    B  test     5  notok      6
9    B  test     5  other     13

df = (df.groupby(['plan', 'type', 'hour', 'status'])['total'].mean()
        .unstack()
        .assign(sum=lambda x: x.sum(1))
        .reset_index()
        .rename_axis(None, 1))
print (df)
  plan  type  hour  notok  ok  other  sum
0    A  cont     0      3  55      1   59 <- 55 = (100 + 10) / 2
1    A  vend     1      2   7      0    9
2    B  test     5      6  20     13   39