Question

我在pandas中有两个数据帧如下所示。

   a  b  c    d
0  1  1  1  0.1
1  1  1  2  0.4
2  1  2  1  0.2
3  1  2  2  0.5


   a  b   c1   c2
0  1  1  0.1  0.4
1  1  2  0.2  0.5

我想知道如何将第一个数据帧转换为第二个数据帧？我尝试使用pivot_table，但除了使用c中的值创建新列之外，我还不确定如何指定保留列a和b。我也尝试过使用groupby和unstack，但是为我创建了一个分层列索引。

Answer 1

如果set_index中的值第一列是唯一的，则非常重要。

然后使用set_index + unstack列c，add_prefix和最后reset_index与rename_axis：

df = (df.set_index(['a','b','c'])['d']
        .unstack()
        .add_prefix('c')
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
   a  b   c1   c2
0  1  1  0.1  0.4
1  1  2  0.2  0.5

如果前3列重复，则需要groupby汇总，汇总函数为mean，sum ...然后解决方案与之前相同或使用pivot_table：

print (df)
   a  b  c    d
0  1  1  1  0.1 <- 1,1,1
1  1  1  2  0.4
2  1  2  1  0.2
3  1  2  2  0.5
4  1  1  1  0.7 <- 1,1,1

df = (df.groupby(['a','b','c'])['d']
        .mean()
        .unstack()
        .add_prefix('c')
        .reset_index()
        .rename_axis(None, axis=1))

或者：

df = (df.pivot_table(index=['a','b'], columns='c', values='d')
        .add_prefix('c')
        .reset_index()
        .rename_axis(None, axis=1))

print (df)
   a  b   c1   c2
0  1  1  0.4  0.4
1  1  2  0.2  0.5

将pandas列值附加为新列

1 个答案: