我在下面的列中有许多缺少的值'?'在store_data数据框中
>>>store_data['trestbps']
0 140
1 130
2 132
3 142
4 110
5 120
6 150
7 180
8 120
9 160
10 126
11 140
12 110
13 ?
我将所有缺少的值替换为-999
store_data.replace('?', -999, inplace = True)
>>>store_data['trestbps']
0 140
1 130
2 132
3 142
4 110
5 120
6 150
7 180
8 120
9 160
10 126
11 140
12 110
13 -999
现在我想对值进行装箱,我使用了这段代码,但是输出全部显示为Nan:
trestbps = store_data['trestbps']
trestbps_bins = [-999,120,140,200]
store_data['trestbps'] = pd.cut(trestbps,trestbps_bins)
>>>store_data['trestbps']
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
在没有缺失值的情况下,类别可以正常工作。 我希望将我的输出分类为(0-12),只有13被-999取代。我该如何实现?
答案 0 :(得分:1)
IIUC,您可以这样做:
bins=[0,120,140,200] #set bins
df.trestbps=pd.cut(df.trestbps,bins) #do the cut
df.trestbps=df.trestbps.values.add_categories(999) #add category as 999
df.trestbps.fillna(999) #fillna with 999
0 (120, 140]
1 (120, 140]
2 (120, 140]
3 (140, 200]
4 (0, 120]
5 (0, 120]
6 (140, 200]
7 (140, 200]
8 (0, 120]
9 (140, 200]
10 (120, 140]
11 (120, 140]
12 (0, 120]
13 999