pandas:将超过2列映射到一列

时间:2017-04-21 14:15:46

标签: python pandas

这是this question的更新版本,它只处理将两列映射到新列。

现在我有三列要使用相同的字典映射到一个新列(如果字典中没有匹配的键,则返回0)。

>> codes = {'2':1,
            '31':1,
            '88':9,
            '99':9}

>> df[['driver_action1','driver_action2','driver_action3']].to_dict()    
{'driver_action1': {0: '1',
  1: '1',
  2: '77',
  3: '77',
  4: '1',
  5: '4',
  6: '2',
  7: '1',
  8: '77',
  9: '99'},
 'driver_action2': {0: '4',
  1: '99',
  2: '99',
  3: '99',
  4: '1',
  5: '2',
  6: '2',
  7: '99',
  8: '99',
  9: '99'},
 'driver_action3': {0: '4',
  1: '99',
  2: '99',
  3: '99',
  4: '1',
  5: '99',
  6: '99',
  7: '99',
  8: '31',
  9: '31'}}

预期产出:

  driver_action1 driver_action2 driver_action3  newcolumn
0              1              4              4          0
1              1             99             99          9
2             77             99             99          9
3             77             99             99          9
4              1              1              1          9
5              4              2             99          1
6              2              2             99          1
7              1             99             99          9
8             77             99             31          1
9             99             99             31          1

我不知道如何使用.applymap()或combine_first()执行此操作。

1 个答案:

答案 0 :(得分:1)

试试这个:

In [174]: df['new'] = df.stack(dropna=False).map(codes).unstack() \
     ...:               .iloc[:, ::-1].ffill(axis=1) \
     ...:               .iloc[:, -1].fillna(0)
     ...:

In [175]: df
Out[175]:
  driver_action1 driver_action2 driver_action3  new
0              1              4              4  0.0
1              1             99             99  9.0
2             77             99             99  9.0
3             77             99             99  9.0
4              1              1              1  0.0
5              4              2             99  1.0
6              2              2             99  1.0
7              1             99             99  9.0
8             77             99             31  9.0
9             99             99             31  9.0

替代解决方案:

df['new'] = df.stack(dropna=False).map(codes).unstack().T \
              .apply(lambda x: x[x.first_valid_index()]
                               if x.first_valid_index() else 0)

说明:

stack,map,unstack映射值:

In [188]: df.stack(dropna=False).map(codes).unstack()
Out[188]:
   driver_action1  driver_action2  driver_action3
0             NaN             NaN             NaN
1             NaN             9.0             9.0
2             NaN             9.0             9.0
3             NaN             9.0             9.0
4             NaN             NaN             NaN
5             NaN             1.0             9.0
6             1.0             1.0             9.0
7             NaN             9.0             9.0
8             NaN             9.0             1.0
9             9.0             9.0             1.0

反向列顺序并沿columns轴应用前向填充:

In [190]: df.stack(dropna=False).map(codes).unstack().iloc[:, ::-1].ffill(axis=1)
Out[190]:
   driver_action3  driver_action2  driver_action1
0             NaN             NaN             NaN
1             9.0             9.0             9.0
2             9.0             9.0             9.0
3             9.0             9.0             9.0
4             NaN             NaN             NaN
5             9.0             1.0             1.0
6             9.0             1.0             1.0
7             9.0             9.0             9.0
8             1.0             9.0             9.0
9             1.0             9.0             9.0

选择上一栏,并使用NaN填充0

In [191]: df.stack(dropna=False).map(codes).unstack().iloc[:, ::-1].ffill(axis=1).iloc[:, -1].fillna(0)
Out[191]:
0    0.0
1    9.0
2    9.0
3    9.0
4    0.0
5    1.0
6    1.0
7    9.0
8    9.0
9    9.0
Name: driver_action1, dtype: float64