我有一个具有以下结构的数据框:
ID Material Description color size dim color size dim Tech
1 xcv456 Rubber 101 s 32 102 m 34 elastic
我想将其转换为:
ID Material Description color size dim tech
1 xcv456 Rubber 101 s 32 elastic
1 xcv456 Rubber 102 m 34 elastic
我有5行和5414列的这个文件,所以我试图自动化我的程序检测冗余列并将它们转换为所需输出格式的过程。任何帮助是极大的赞赏。
答案 0 :(得分:2)
使用:
#mask for all duplicates columns
m = df.columns.duplicated(keep=False)
#set index with not dupe columns
df = df.set_index(df.columns[~m].tolist())
#count dupes for MultiIndex
s = df.columns.to_series()
df.columns = [df.columns, s.groupby(s).cumcount()]
#reshape and remove 4 level, because 4 non dupe columns
df = df.stack().reset_index(level=4, drop=True).reset_index()
print (df)
ID Material Description Tech color dim size
0 1 xcv456 Rubber elastic 101 32 s
1 1 xcv456 Rubber elastic 102 34 m
print (df)
ID Material Description color size dim color size dim Tech
0 1 xcv456 Rubber 101 s 32 102 m 34 elastic
1 2 xcv457 Rubber1 101 s 37 108 m 55 elastic2
#mask for all duplicates columns
m = df.columns.duplicated(keep=False)
#set index with not dupe columns
df = df.set_index(df.columns[~m].tolist())
#count dupes for MultiIndex
s = df.columns.to_series()
df.columns = [df.columns, s.groupby(s).cumcount()]
df = df.stack().reset_index(level=4, drop=True).reset_index()
print (df)
ID Material Description Tech color dim size
0 1 xcv456 Rubber elastic 101 32 s
1 1 xcv456 Rubber elastic 102 34 m
2 2 xcv457 Rubber1 elastic2 101 37 s
3 2 xcv457 Rubber1 elastic2 108 55 m
答案 1 :(得分:1)
在使用pd.wide_to_Long
hh=pd.Series(df.columns)
df.columns=hh+hh.groupby(hh).cumcount().add(1).astype(str)
pd.wide_to_long(df,['color','size','dim'],i=['ID1','Material1','Description1','Tech1'],j='drop').reset_index().drop('drop',1
)
Out[556]:
ID1 Material1 Description1 Tech1 color size dim
0 1 xcv456 Rubber elastic 101 s 32
1 1 xcv456 Rubber elastic 102 m 34