如何在熊猫中合并多列值?

时间:2020-02-05 14:23:33

标签: python pandas

我有一个数据框:

data = {'Wavelength Band1': [410, 411, 412], 'Band1': [0, 0, 0],
        'Wavelength Band2': [500, 501, 502], 'Band2': [0, 0.0007, 0.0021],
        'Wavelength Band3': [730, 745, 750], 'Band3': [0.0023, 0.0046, 0.007]}

df = pd.DataFrame(data=data)

我想将 Wavelength BandX 的所有列都合并到单个列 Wavelength 中。如果 BandX 在特定波长中没有对应的值,则应为 NaN

所需的输出:

output = {'Wavelength': [410, 411, 412, 500, 501, 502, 730, 745, 750],
          'Band1': [0, 0, 0, NaN, NaN, NaN, NaN, NaN, NaN],
          'Band2': [NaN, NaN, NaN, 0, 0.0007, 0.0021, NaN, NaN, NaN],
          'Band3': [NaN, NaN, NaN, NaN, NaN, NaN, 0.0023, 0.0046, 0.007]}

df = pd.DataFrame(data=output)

1 个答案:

答案 0 :(得分:2)

这有点棘手,但这是wide_to_long

u = pd.wide_to_long(
      df.reset_index(), stubnames=['Wavelength'], i='index', j='id', sep=' ', suffix='Band\d+')

d = u.filter(like='Band')

i = u.index.get_level_values('id').to_numpy()
j = d.columns.to_numpy()

m = i[:, None] != j

d.mask(m).assign(Wavelength=u['Wavelength']).reset_index(1, drop=True)

       Band1   Band2   Band3  Wavelength
index
0        0.0     NaN     NaN         410
1        0.0     NaN     NaN         411
2        0.0     NaN     NaN         412
0        NaN  0.0000     NaN         500
1        NaN  0.0007     NaN         501
2        NaN  0.0021     NaN         502
0        NaN     NaN  0.0023         730
1        NaN     NaN  0.0046         745
2        NaN     NaN  0.0070         750

说明

第一步可以让我们90%到达那里:

>>> u
             Band1   Band2   Band3  Wavelength
index id
0     Band1      0  0.0000  0.0023         410
1     Band1      0  0.0007  0.0046         411
2     Band1      0  0.0021  0.0070         412
0     Band2      0  0.0000  0.0023         500
1     Band2      0  0.0007  0.0046         501
2     Band2      0  0.0021  0.0070         502
0     Band3      0  0.0000  0.0023         730
1     Band3      0  0.0007  0.0046         745
2     Band3      0  0.0021  0.0070         750

我们只需要掩盖与id操作中的wide_to_long水平值不匹配的波长,我们可以使用numpy中的广播比较来做到这一点:< / p>

>>> m
array([[False,  True,  True],
       [False,  True,  True],
       [False,  True,  True],
       [ True, False,  True],
       [ True, False,  True],
       [ True, False,  True],
       [ True,  True, False],
       [ True,  True, False],
       [ True,  True, False]])

掩码中的false值是列与索引匹配的值,我们希望保留这些值。