熊猫以南值削减结果

时间:2019-05-11 13:46:24

标签: pandas replace nan cut bin

我在下面的列中有许多缺少的值'?'在store_data数据框中

>>>store_data['trestbps']
0      140
1      130
2      132
3      142
4      110
5      120
6      150
7      180
8      120
9      160
10     126
11     140
12     110
13       ?

我将所有缺少的值替换为-999

store_data.replace('?', -999, inplace = True)

>>>store_data['trestbps']
0       140
1       130
2       132
3       142
4       110
5       120
6       150
7       180
8       120
9       160
10      126
11      140
12      110
13     -999

现在我想对值进行装箱,我使用了这段代码,但是输出全部显示为Nan:

trestbps = store_data['trestbps']
trestbps_bins = [-999,120,140,200]
store_data['trestbps'] = pd.cut(trestbps,trestbps_bins)
>>>store_data['trestbps']
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
8      NaN
9      NaN
10     NaN
11     NaN
12     NaN
13     NaN

在没有缺失值的情况下,类别可以正常工作。 我希望将我的输出分类为(0-12),只有13被-999取代。我该如何实现?

1 个答案:

答案 0 :(得分:1)

IIUC,您可以这样做:

bins=[0,120,140,200] #set bins
df.trestbps=pd.cut(df.trestbps,bins) #do the cut
df.trestbps=df.trestbps.values.add_categories(999) #add category as 999
df.trestbps.fillna(999) #fillna with 999

0     (120, 140]
1     (120, 140]
2     (120, 140]
3     (140, 200]
4       (0, 120]
5       (0, 120]
6     (140, 200]
7     (140, 200]
8       (0, 120]
9     (140, 200]
10    (120, 140]
11    (120, 140]
12      (0, 120]
13           999