Question

我的数据帧如下所示

pd.DataFrame({"B.count": [0, 0, 1, 0, 0],
              "B.score": [0, 0, 87,0 ,0],
              "C.count": [0, 1, 0, 1, 0],
              "C.score": [0, 91, 0, 14, 0],
              "D.count": [1, 0, 10, 0, 11],
              "D.score": [93, 0, 3, 0, 4]}, 
               index = [1,2,3,4,5])

，我想将其转换为长密度稀疏格式。

pd.DataFrame({"id": [1, 2, 3, 3, 4, 5],
              "taste": ["D", "C", "B", "D", "C", "D"],
              "count": [1, 1, 1, 10, 1, 11],
              "score": [93, 91, 87, 3, 14, 4]})

解决方案似乎必须通过wide_to_long函数，但是很遗憾，我无法使其正常工作。

Answer 1

让我们构建一个自定义wide_to_long：

# mask the 0 with nan, stack to get rid of the nan's
s = df.where(df>0).stack().reset_index()

# output dataframe
(pd.concat((s.rename(columns={'level_0':'id'}),
            s.level_1.str.extract('(?P<taste>.+)\.(?P<type>count|score)$')
           ), axis=1
          )
  .pivot_table(index=['id','taste'], columns='type',values=0 )
  .reset_index()
)

输出：

type  id taste  count  score
0      1     D    1.0   93.0
1      2     C    1.0   91.0
2      3     B    1.0   87.0
3      3     D   10.0    3.0
4      4     C    1.0   14.0
5      5     D   11.0    4.0

Answer 2

是的，您可以使用熊猫wide to long；但是，您必须对列的顺序进行一些调整：

#made a change to the positioning of values in the columns
#get the words after the dot to come before words before the dot
#makes it easier to use pandas wide to long
df.columns = [F'{i[2:]}.{i[0]}' for i in df.columns]

#create id column
df = df.assign(id=df.index)

#convert from wide to long
(pd.wide_to_long(df,
                 stubnames=['count','score'],
                 sep='.',
                 i='id',
                 j='taste', 
                 suffix='[A-Z]')
 #remove 0 values
 .query('count != 0')
 .sort_index()
 .reset_index()
)

    id  taste   count   score
0   1   D       1       93
1   2   C       1       91
2   3   B       1       87
3   3   D       10      3
4   4   C       1       14
5   5   D       11      4

将宽矩阵转换为压缩的稀疏格式

2 个答案: