我有一个数据框:
data = {'Wavelength Band1': [410, 411, 412], 'Band1': [0, 0, 0],
'Wavelength Band2': [500, 501, 502], 'Band2': [0, 0.0007, 0.0021],
'Wavelength Band3': [730, 745, 750], 'Band3': [0.0023, 0.0046, 0.007]}
df = pd.DataFrame(data=data)
我想将 Wavelength BandX 的所有列都合并到单个列 Wavelength 中。如果 BandX 在特定波长中没有对应的值,则应为 NaN 。
所需的输出:
output = {'Wavelength': [410, 411, 412, 500, 501, 502, 730, 745, 750],
'Band1': [0, 0, 0, NaN, NaN, NaN, NaN, NaN, NaN],
'Band2': [NaN, NaN, NaN, 0, 0.0007, 0.0021, NaN, NaN, NaN],
'Band3': [NaN, NaN, NaN, NaN, NaN, NaN, 0.0023, 0.0046, 0.007]}
df = pd.DataFrame(data=output)
答案 0 :(得分:2)
这有点棘手,但这是wide_to_long
u = pd.wide_to_long(
df.reset_index(), stubnames=['Wavelength'], i='index', j='id', sep=' ', suffix='Band\d+')
d = u.filter(like='Band')
i = u.index.get_level_values('id').to_numpy()
j = d.columns.to_numpy()
m = i[:, None] != j
d.mask(m).assign(Wavelength=u['Wavelength']).reset_index(1, drop=True)
Band1 Band2 Band3 Wavelength
index
0 0.0 NaN NaN 410
1 0.0 NaN NaN 411
2 0.0 NaN NaN 412
0 NaN 0.0000 NaN 500
1 NaN 0.0007 NaN 501
2 NaN 0.0021 NaN 502
0 NaN NaN 0.0023 730
1 NaN NaN 0.0046 745
2 NaN NaN 0.0070 750
说明
第一步可以让我们90%到达那里:
>>> u
Band1 Band2 Band3 Wavelength
index id
0 Band1 0 0.0000 0.0023 410
1 Band1 0 0.0007 0.0046 411
2 Band1 0 0.0021 0.0070 412
0 Band2 0 0.0000 0.0023 500
1 Band2 0 0.0007 0.0046 501
2 Band2 0 0.0021 0.0070 502
0 Band3 0 0.0000 0.0023 730
1 Band3 0 0.0007 0.0046 745
2 Band3 0 0.0021 0.0070 750
我们只需要掩盖与id
操作中的wide_to_long
水平值不匹配的波长,我们可以使用numpy
中的广播比较来做到这一点:< / p>
>>> m
array([[False, True, True],
[False, True, True],
[False, True, True],
[ True, False, True],
[ True, False, True],
[ True, False, True],
[ True, True, False],
[ True, True, False],
[ True, True, False]])
掩码中的false值是列与索引匹配的值,我们希望保留这些值。