如何用pandas数据帧中的范围替换列值

时间:2017-05-30 18:46:29

标签: python pandas dataframe range conditional-statements

我的数据框叫做' df'我想用一个数据框中的列范围内的值替换另一列中的相应值。

  1. 6< = age< 11然后1

    11< = age< 16然后2

    16< = age< 21然后3

    21< = age然后4

            age
    86508   12.0
    86509   6.0
    86510   7.0
    86511   8.0
    86512   10.0
    86513   15.0
    86514   15.0
    86515   16.0
    86516   20.0
    86517   23.0
    86518   23.0
    86519   7.0
    86520   18.0
    
  2. 结果

                age    stage
        86508   12.0    2
        86509   6.0     1    
        86510   7.0     1
        86511   8.0     1
        86512   10.0    1
        86513   15.0    2
        86514   15.0    2
        86515   16.0    2
        86516   20.0    3
        86517   23.0    4
        86518   23.0    4
        86519   7.0     1
        86520   18.0    3
    

    感谢。

2 个答案:

答案 0 :(得分:5)

使用pd.cut()

In [37]: df['stage'] = pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4])

In [38]: df
Out[38]:
        age stage
86508  12.0     2
86509   6.0     1
86510   7.0     1
86511   8.0     1
86512  10.0     1
86513  15.0     2
86514  15.0     2
86515  16.0     2
86516  20.0     3
86517  23.0     4
86518  23.0     4
86519   7.0     1
86520  18.0     3

more generic solution provided by @ayhan

In [39]: df['stage'] = pd.cut(df.age, bins=[0, 11, 16, 21, np.inf], labels=False, right=True) + 1

In [40]: df
Out[40]:
        age  stage
86508  12.0      2
86509   6.0      1
86510   7.0      1
86511   8.0      1
86512  10.0      1
86513  15.0      2
86514  15.0      2
86515  16.0      2
86516  20.0      3
86517  23.0      4
86518  23.0      4
86519   7.0      1
86520  18.0      3

答案 1 :(得分:4)

使用np.searchsorted

a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)

        age  stage
86508  12.0      2
86509   6.0      1
86510   7.0      1
86511   8.0      1
86512  10.0      1
86513  15.0      2
86514  15.0      2
86515  16.0      3
86516  20.0      3
86517  23.0      4
86518  23.0      4
86519   7.0      1
86520  18.0      3

<强>时序
小数据

%%timeit
a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)
1000 loops, best of 3: 288 µs per loop

%%timeit
df.assign(stage=pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4]))
1000 loops, best of 3: 668 µs per loop