Pandas pd.cut其他领域等于价值的领域

时间:2016-09-15 20:47:16

标签: python pandas dataframe

我正在尝试使用pd.cut创建一个新字段。但是,此新字段的创建/填充依赖于另一个字段中的值。

hdl_bins = [0,40,59,300]
hdl_labels = ['hdl_high risk','hdl_borderline','hdl_protective']

df['hdl'] = pd.cut(df['value'],bins=hdl_bins,labels=hdl_labels)

我只想在满足以下条件时填充新字段“hdl”:

df[(df['name']=='HDL')

我如何最好地将“where”标准添加到pd.cut操作?谢谢!

编辑:

以下是输入的示例:

id,date,name,value
1,1/1/11,Weight,76.3
1,1/2/11,Height,152.7
1,1/3/11,Body mass index (BMI) [Ratio],32.7
1,1/4/11,Temperature,98.6
1,1/5/11,Systolic,118.9
1,1/6/11,Diastolic,69.8
1,1/7/11,LDL,98
1,1/8/11,HDL,63.2
1,1/9/11,Total Cholesterol,263.1
1,1/10/11,Trigl SerPl-mCnc,509.7
1,1/11/11,LDL,98
1,1/12/11,HDL,63.2
1,1/13/11,Total Cholesterol,263.1
1,1/14/11,Trigl SerPl-mCnc,509.7

期望的输出:

id,date,name,value,hdl
1,1/1/11,Weight,76.3,0
1,1/2/11,Height,152.7,0
1,1/3/11,Body mass index (BMI) [Ratio],32.7,0
1,1/4/11,Temperature,98.6,0
1,1/5/11,Systolic,118.9,0
1,1/6/11,Diastolic,69.8,0
1,1/7/11,LDL,98,0
1,1/8/11,HDL,63.2,hdl_protective
1,1/9/11,Total Cholesterol,263.1,0
1,1/10/11,Trigl SerPl-mCnc,509.7,0
1,1/11/11,LDL,98,0
1,1/12/11,HDL,63.2,hdl_protective
1,1/13/11,Total Cholesterol,263.1,0
1,1/14/11,Trigl SerPl-mCnc,509.7,0

1 个答案:

答案 0 :(得分:1)

<强>更新

注意最后一行(value == 'XXX'):

In [55]: df
Out[55]:
    id      date                           name  value
0    1    1/1/11                         Weight   76.3
1    1    1/2/11                         Height  152.7
2    1    1/3/11  Body mass index (BMI) [Ratio]   32.7
3    1    1/4/11                    Temperature   98.6
4    1    1/5/11                       Systolic  118.9
5    1    1/6/11                      Diastolic   69.8
6    1    1/7/11                            LDL     98
7    1    1/8/11                            HDL   63.2
8    1    1/9/11              Total Cholesterol  263.1
9    1   1/10/11               Trigl SerPl-mCnc  509.7
10   1   1/11/11                            LDL     98
11   1   1/12/11                            HDL   63.2
12   1   1/13/11              Total Cholesterol  263.1
13   1   1/14/11               Trigl SerPl-mCnc  509.7
14   1  12/12/12                            HDL    XXX

解决方案:

In [56]: df['hdl'] = '0'

In [57]: df.ix[df['name']=='HDL', 'hdl'] = \
   ....:   pd.cut(pd.to_numeric(df.ix[df['name']=='HDL','value'], errors='corce'),bins=hdl_bins,labels=hdl_labels)

In [58]: df
Out[58]:
    id      date                           name  value             hdl
0    1    1/1/11                         Weight   76.3               0
1    1    1/2/11                         Height  152.7               0
2    1    1/3/11  Body mass index (BMI) [Ratio]   32.7               0
3    1    1/4/11                    Temperature   98.6               0
4    1    1/5/11                       Systolic  118.9               0
5    1    1/6/11                      Diastolic   69.8               0
6    1    1/7/11                            LDL     98               0
7    1    1/8/11                            HDL   63.2  hdl_protective
8    1    1/9/11              Total Cholesterol  263.1               0
9    1   1/10/11               Trigl SerPl-mCnc  509.7               0
10   1   1/11/11                            LDL     98               0
11   1   1/12/11                            HDL   63.2  hdl_protective
12   1   1/13/11              Total Cholesterol  263.1               0
13   1   1/14/11               Trigl SerPl-mCnc  509.7               0
14   1  12/12/12                            HDL    XXX             NaN

旧回答:

In [13]: df
Out[13]:
   value name
0    123  XXX
1     18  LDL
2    195  LDL
3     25  XXX
4     70  LDL
5     11  LDL
6    199  XXX
7    163  LDL
8     32  LDL
9     85  XXX

In [14]: hdl_bins = [0,40,59,300]

In [15]: hdl_labels = ['hdl_high risk','hdl_borderline','hdl_protective']

In [16]: df['hdl'] = ''

In [22]: df.ix[df['name']=='LDL', 'hdl'] = \
   ....:     pd.cut(df.ix[df['name']=='LDL','value'],bins=hdl_bins,labels=hdl_labels)

In [23]: df
Out[23]:
   value name             hdl
0    123  XXX
1     18  LDL   hdl_high risk
2    195  LDL  hdl_protective
3     25  XXX
4     70  LDL  hdl_protective
5     11  LDL   hdl_high risk
6    199  XXX
7    163  LDL  hdl_protective
8     32  LDL   hdl_high risk
9     85  XXX