我正在尝试使用pd.cut创建一个新字段。但是,此新字段的创建/填充依赖于另一个字段中的值。
hdl_bins = [0,40,59,300]
hdl_labels = ['hdl_high risk','hdl_borderline','hdl_protective']
df['hdl'] = pd.cut(df['value'],bins=hdl_bins,labels=hdl_labels)
我只想在满足以下条件时填充新字段“hdl”:
df[(df['name']=='HDL')
我如何最好地将“where”标准添加到pd.cut操作?谢谢!
编辑:
以下是输入的示例:
id,date,name,value
1,1/1/11,Weight,76.3
1,1/2/11,Height,152.7
1,1/3/11,Body mass index (BMI) [Ratio],32.7
1,1/4/11,Temperature,98.6
1,1/5/11,Systolic,118.9
1,1/6/11,Diastolic,69.8
1,1/7/11,LDL,98
1,1/8/11,HDL,63.2
1,1/9/11,Total Cholesterol,263.1
1,1/10/11,Trigl SerPl-mCnc,509.7
1,1/11/11,LDL,98
1,1/12/11,HDL,63.2
1,1/13/11,Total Cholesterol,263.1
1,1/14/11,Trigl SerPl-mCnc,509.7
期望的输出:
id,date,name,value,hdl
1,1/1/11,Weight,76.3,0
1,1/2/11,Height,152.7,0
1,1/3/11,Body mass index (BMI) [Ratio],32.7,0
1,1/4/11,Temperature,98.6,0
1,1/5/11,Systolic,118.9,0
1,1/6/11,Diastolic,69.8,0
1,1/7/11,LDL,98,0
1,1/8/11,HDL,63.2,hdl_protective
1,1/9/11,Total Cholesterol,263.1,0
1,1/10/11,Trigl SerPl-mCnc,509.7,0
1,1/11/11,LDL,98,0
1,1/12/11,HDL,63.2,hdl_protective
1,1/13/11,Total Cholesterol,263.1,0
1,1/14/11,Trigl SerPl-mCnc,509.7,0
答案 0 :(得分:1)
<强>更新强>
注意最后一行(value == 'XXX'
):
In [55]: df
Out[55]:
id date name value
0 1 1/1/11 Weight 76.3
1 1 1/2/11 Height 152.7
2 1 1/3/11 Body mass index (BMI) [Ratio] 32.7
3 1 1/4/11 Temperature 98.6
4 1 1/5/11 Systolic 118.9
5 1 1/6/11 Diastolic 69.8
6 1 1/7/11 LDL 98
7 1 1/8/11 HDL 63.2
8 1 1/9/11 Total Cholesterol 263.1
9 1 1/10/11 Trigl SerPl-mCnc 509.7
10 1 1/11/11 LDL 98
11 1 1/12/11 HDL 63.2
12 1 1/13/11 Total Cholesterol 263.1
13 1 1/14/11 Trigl SerPl-mCnc 509.7
14 1 12/12/12 HDL XXX
In [56]: df['hdl'] = '0'
In [57]: df.ix[df['name']=='HDL', 'hdl'] = \
....: pd.cut(pd.to_numeric(df.ix[df['name']=='HDL','value'], errors='corce'),bins=hdl_bins,labels=hdl_labels)
In [58]: df
Out[58]:
id date name value hdl
0 1 1/1/11 Weight 76.3 0
1 1 1/2/11 Height 152.7 0
2 1 1/3/11 Body mass index (BMI) [Ratio] 32.7 0
3 1 1/4/11 Temperature 98.6 0
4 1 1/5/11 Systolic 118.9 0
5 1 1/6/11 Diastolic 69.8 0
6 1 1/7/11 LDL 98 0
7 1 1/8/11 HDL 63.2 hdl_protective
8 1 1/9/11 Total Cholesterol 263.1 0
9 1 1/10/11 Trigl SerPl-mCnc 509.7 0
10 1 1/11/11 LDL 98 0
11 1 1/12/11 HDL 63.2 hdl_protective
12 1 1/13/11 Total Cholesterol 263.1 0
13 1 1/14/11 Trigl SerPl-mCnc 509.7 0
14 1 12/12/12 HDL XXX NaN
旧回答:
In [13]: df
Out[13]:
value name
0 123 XXX
1 18 LDL
2 195 LDL
3 25 XXX
4 70 LDL
5 11 LDL
6 199 XXX
7 163 LDL
8 32 LDL
9 85 XXX
In [14]: hdl_bins = [0,40,59,300]
In [15]: hdl_labels = ['hdl_high risk','hdl_borderline','hdl_protective']
In [16]: df['hdl'] = ''
In [22]: df.ix[df['name']=='LDL', 'hdl'] = \
....: pd.cut(df.ix[df['name']=='LDL','value'],bins=hdl_bins,labels=hdl_labels)
In [23]: df
Out[23]:
value name hdl
0 123 XXX
1 18 LDL hdl_high risk
2 195 LDL hdl_protective
3 25 XXX
4 70 LDL hdl_protective
5 11 LDL hdl_high risk
6 199 XXX
7 163 LDL hdl_protective
8 32 LDL hdl_high risk
9 85 XXX