我的数据框如下:
df = pd.DataFrame(columns=['New Category', 'Sample1', 'Sample2'],
data=[
['Pathogenic/Likely Pathogenic', '0/0:240', '1/0:100'],
['Likely Benign', '1/1:0,237', '1/0:700'],
['Likely Benign', '0/0:239', '0/0:234'],
['Likely Benign', '1/1:1,238', '0/1:890'],
['Likely Benign', '0/1:156,79', '1/1:767'],
['VUS', '1/1:0,241', '0/1:21']
])
看起来像这样:
New Category Sample1 Sample2
0 Pathogenic/Likely Pathogenic 0/0:240 1/0:100
1 Likely Benign 1/1:237 1/0:700
2 Likely Benign 0/0:239 0/0:234
3 Likely Benign 1/1:238 0/1:890
4 Likely Benign 0/1:156 1/1:767
5 VUS 1/1:241 0/1:21
我想做一些多索引,以便Sample1和Sample2值被冒号分割并作为子列名放在下面。但是,我不希望这些子列名称应用于“新类别”列。基本上我希望它看起来像这样:
New Category Sample1 Sample2
GT GQ GT GQ
0 Pathogenic/Likely Pathogenic 0/0 240 1/0 100
1 Likely Benign 1/1 237 1/0 700
2 Likely Benign 0/0 239 0/0 234
3 Likely Benign 1/1 238 0/1 890
4 Likely Benign 0/1 156 1/1 767
5 VUS 1/1 241 0/1 21
我真的很难过如何做到这一点。 pandas docs的多索引页面不包含仅对所选列进行多索引的示例。这让我们想知道这是否可能。
答案 0 :(得分:1)
这不是“索引”的问题,而是操纵数据,特别是拆分列。以下应该做:
df_new_category = pd.DataFrame(
df[['New Category']].values,
columns=pd.MultiIndex.from_tuples([('New Category', '')])
)
sample_data_dfs = \
[pd.DataFrame(list(df[col].str.split(':')),
columns=pd.MultiIndex.from_product([[col], ['GT', 'GQ']]))
for col in ['Sample1', 'Sample2']]
pd.concat([df_new_category] + sample_data_dfs, axis=1)
请注意,可以一次完成拆分(即每列上没有循环),如下所示:
df[['Sample1', 'Sample2']].applymap(lambda s : s.split(':'))
......但是