Question

我有一个如下所示的熊猫数据框：

id  c1   c2       c3
1   5    text1    -4
2   8    text2    -1
1   4    text1     0
2   7    text2    -8
3   2    text3    -5
1   2    text1    -8
...

然后，这是我的愿望输出：

id    c2       c3   c4  c5
1     text1    -4   0   -8
2     text2    -1  -8
3     text3    -5
...

，我尝试了这段代码，

  df.groupby(['id','c2']).cumcount().add(1).astype(str)

但是没有用。

Answer 1

将GroupBy.cumcount与DataFrame.set_index一起使用，将Series.unstack与DataFrame.add_prefix一起使用：

g = df.groupby(['id','c2']).cumcount().add(3)
df = df.set_index(['id','c2', g])['c3'].unstack().add_prefix('c').reset_index()
print (df)
   id     c2   c3   c4   c5
0   1  text1 -4.0  0.0 -8.0
1   2  text2 -1.0 -8.0  NaN
2   3  text3 -5.0  NaN  NaN

如果需要，将NaN替换为0：

g = df.groupby(['id','c2']).cumcount().add(3)
df = df.set_index(['id','c2', g])['c3'].unstack(fill_value=0).add_prefix('c').reset_index()
print (df)
   id     c2  c3  c4  c5
0   1  text1  -4   0  -8
1   2  text2  -1  -8   0
2   3  text3  -5   0   0

在id上聚合并在不同的列pandas中附加不同的值

1 个答案: