这是我读取csv文件时的输入文件:
Sample Info D3S1358 1 D3S1358 2 TH01 1 TH01 2 D21S11 1 D21S11 2 D21S11 3
TEST_646 17 17 9 9.3 28 28 nan
TEST_647 18 18 7 7 29 30 30.2
TEST_648 16 16 9 9 31.2 31.2 nan
我想将其转换为这样的形式:
Sample_name Marker mrk value
TEST_646 D3S1358 1 17
TEST_646 D3S1358 2 17
TEST_646 TH01 1 9
TEST_646 TH01 2 9.3
TEST_646 D21S11 1 28.0
TEST_646 D21S11 2 28.0
TEST_646 D21S11 3 nan
PS。以下是逗号分隔形式的值,以方便您使用:
Sample Info, D3S1358 1, D3S1358 2, TH01 1, TH01 2, D21S11 1, D21S11 2, D21S11 3
TEST_646, 17, 17, 9, 9.3, 28, 28, nan
TEST_647, 18, 18, 7, 7, 29, 30, 30.2
TEST_648, 16, 16, 9, 9, 31.2, 31.2, nan
到目前为止我的解决方案是:
samples = xls.parse(sheet).set_index('Sample Info')
cols = list(set(filter(None, [i[:-2] if i!="Sample Info" else None for i in samples.columns])))
sample_df_d= {'1' : pd.Series( len(cols)*[''], index=cols), '2' : pd.Series( len(cols)*[''], index=cols), '3' : pd.Series( len(cols)*[''], index=cols)}
sample_df_ = pd.DataFrame(sample_df_d)
sample_ser = sample_df_.stack()
sample_df = pd.DataFrame(sample_ser, columns=['value'])
#print sample_df
for i,j in samples.iterrows():
for i2,j2 in j.iteritems():
print j[0], i2[:-2], "\t", i2[-2:],"\t", j2
会产生类似这样的东西:
17 D3S1358 1 17
17 D3S1358 2 17
17 TH01 1 9
17 TH01 2 9.3
17 D21S11 1 28.0
答案 0 :(得分:5)
以下是堆叠方式,首先将列清理为MultiIndex
:
In [11]: df_1 = df0.set_index('Sample Info')
In [12]: df_1.columns = pd.MultiIndex.from_arrays(zip(*df_1.columns.map(str.split)),
names=['Marker', 'mrk'])
In [13]: df_1
Out[13]:
Marker D3S1358 TH01 D21S11
mrk 1 2 1 2 1 2 3
Sample Info
TEST_646 17 17 9 9.3 28.0 28.0 NaN
TEST_647 18 18 7 7.0 29.0 30.0 30.2
TEST_648 16 16 9 9.0 31.2 31.2 NaN
然后你可以stack
(首先是'Marker'
然后是'mrk'
):
In [14]: df_2 = df_1.stack(level=['Marker', 'mrk'])
In [15]: df_2
Sample Info Marker mrk
TEST_646 D21S11 1 28.0
2 28.0
D3S1358 1 17.0
2 17.0
TH01 1 9.0
2 9.3
TEST_647 D21S11 1 29.0
2 30.0
3 30.2
D3S1358 1 18.0
2 18.0
TH01 1 7.0
2 7.0
TEST_648 D21S11 1 31.2
2 31.2
D3S1358 1 16.0
2 16.0
TH01 1 9.0
2 9.0
dtype: float64
如果您希望将其重新设置为列,则可以重置_index:
df_2.reset_index()