Question

我有一个具有以下结构的数据框：

ID Material Description color size dim color size dim Tech
1  xcv456    Rubber       101   s   32  102    m   34  elastic

我想将其转换为：

ID Material Description color size dim tech
1  xcv456   Rubber       101   s    32  elastic
1  xcv456   Rubber       102   m    34  elastic

我有5行和5414列的这个文件，所以我试图自动化我的程序检测冗余列并将它们转换为所需输出格式的过程。任何帮助是极大的赞赏。

Answer 1

使用：

#mask for all duplicates columns
m = df.columns.duplicated(keep=False)
#set index with not dupe columns
df = df.set_index(df.columns[~m].tolist())
#count dupes for MultiIndex
s = df.columns.to_series()
df.columns = [df.columns, s.groupby(s).cumcount()]
#reshape and remove 4 level, because 4 non dupe columns
df = df.stack().reset_index(level=4, drop=True).reset_index()
print (df)
   ID Material Description     Tech  color  dim size
0   1   xcv456      Rubber  elastic    101   32    s
1   1   xcv456      Rubber  elastic    102   34    m

print (df)
   ID Material Description  color size  dim  color size  dim      Tech
0   1   xcv456      Rubber    101    s   32    102    m   34   elastic
1   2   xcv457     Rubber1    101    s   37    108    m   55  elastic2

#mask for all duplicates columns
m = df.columns.duplicated(keep=False)
#set index with not dupe columns
df = df.set_index(df.columns[~m].tolist())
#count dupes for MultiIndex
s = df.columns.to_series()
df.columns = [df.columns, s.groupby(s).cumcount()]
df = df.stack().reset_index(level=4, drop=True).reset_index()
print (df)
   ID Material Description      Tech  color  dim size
0   1   xcv456      Rubber   elastic    101   32    s
1   1   xcv456      Rubber   elastic    102   34    m
2   2   xcv457     Rubber1  elastic2    101   37    s
3   2   xcv457     Rubber1  elastic2    108   55    m

Answer 2

在使用pd.wide_to_Long

之前需要一点点处理

hh=pd.Series(df.columns)
df.columns=hh+hh.groupby(hh).cumcount().add(1).astype(str)
pd.wide_to_long(df,['color','size','dim'],i=['ID1','Material1','Description1','Tech1'],j='drop').reset_index().drop('drop',1
                                                                                                                )
Out[556]: 
   ID1 Material1 Description1    Tech1  color size  dim
0    1    xcv456       Rubber  elastic    101    s   32
1    1    xcv456       Rubber  elastic    102    m   34

通过在数据框中旋转数据，将具有相同名称的列重新排列到一列

2 个答案: