Question

我有一个要融合的数据框。这是输入：

col1    col2    col3    col4    col5
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0

这是输出：

col1  col2
file1 text_0
file1 text_1
file1 text_2
file2 text_0
file2 text_1
file2 text_2
file2 text_3
file3 text_0

Answer 1

首先使用DataFrame.melt，然后通过query过滤出空字符串，最后删除列variable：

df1 = (df.melt('col1', var_name='col2')
         .query("value != ''")
         .sort_values('col1')
         .drop('col2', axis=1))

print (df1)
     col1   value
0   file1  text_0
3   file1  text_1
6   file1  text_2
1   file2  text_0
4   file2  text_1
7   file2  text_2
10  file2  text_3
2   file3  text_0

Answer 2

我们可以做到：

new_df = ( df[df.ne('')].melt('col1',value_name = 'col2')
                        .dropna()
                        .drop('variable',axis=1)
                        .sort_values('col1')
                        .reset_index(drop=True) )

我们也可以使用DataFrame.stack 摆脱'' 将其转换为NaN

之后

new_df = (df[df.notnull()&df.ne('')].set_index('col1')
                                    .stack()
                                    .rename('col2')
                                    .reset_index(level=['col1',0],drop=0))
print(new_df)

输出

    col1    col2
0  file1  text_0
1  file1  text_1
2  file1  text_2
3  file2  text_0
4  file2  text_1
5  file2  text_2
6  file2  text_3
7  file3  text_0

如何在Python中融化或拆开dataFrame？

2 个答案: