Question

有没有人知道如何将pandas数据帧拆分成一个对索引列没有干扰的数据帧，即转换此数据帧

    col1  col2  col3  col4
0     1     0     0     0
1     2     0     0     0
2     3     1     0     0
3     4     2     1     0
4     0     3     2     0
5     0     4     3     0
6     0     5     4     0
7     0     0     5     1
8     0     0     6     2
9     0     0     0     3

以下内容：

      new_col1  new_col2  new_col3  new_col4  new_col5  new_col6
0         1         0         0         0         0         0
1         2         0         0         0         0         0
2         0         4         0         0         0         0
3         0         0         7         0         0         0
4         0         0         0         5         0         0
5         0         0         0         7         0         0
6         0         0         0         9         0         0
7         0         0         0         0         6         0
8         0         0         0         0         8         0
9         0         0         0         0         0         3

在pandas中没有任何for循环的

。我们的想法是将共享索引的所有列的值组合到新列中，不留下索引共享的列。

Answer 1

编辑重写：

(df.sum(1).to_frame()
  .set_index(df.groupby((df.ne(0) * df.columns).sum(1)).cumcount().eq(0).cumsum(),
              append=True)[0]
  .unstack(fill_value=0).add_prefix('new_col'))

输出：

   new_col1  new_col2  new_col3  new_col4  new_col5  new_col6
0         1         0         0         0         0         0
1         2         0         0         0         0         0
2         0         4         0         0         0         0
3         0         0         7         0         0         0
4         0         0         0         5         0         0
5         0         0         0         7         0         0
6         0         0         0         9         0         0
7         0         0         0         0         6         0
8         0         0         0         0         8         0
9         0         0         0         0         0         3

一种方法可以做到这一点：

s = df.groupby(df.ne(0)\
     .apply(lambda x: ','.join(df.columns[x].tolist()), axis=1))\
     .cumcount().eq(0).cumsum()

df_out = df.sum(1).to_frame().set_index(s, append=True)[0]\
  .unstack(fill_value=0).add_prefix('new_col')

df_out

输出：

   new_col1  new_col2  new_col3  new_col4  new_col5  new_col6
0         1         0         0         0         0         0
1         2         0         0         0         0         0
2         0         4         0         0         0         0
3         0         0         7         0         0         0
4         0         0         0         5         0         0
5         0         0         0         7         0         0
6         0         0         0         9         0         0
7         0         0         0         0         6         0
8         0         0         0         0         8         0
9         0         0         0         0         0         3

Psuedo Logic：

查找每行非零值的所有列的列表。按行列表对行进行分组，并使用cumcount和cumsum创建递增值。使用append和unstack将此递增值添加到索引中以创建列。

选择具有非零值的列，这些值在pandas中共享索引且没有循环

1 个答案:

编辑重写：