PYTHON:根据包含间隔|的字典填充df的纳米值TypeError:unorderable类型:Interval()< INT()

时间:2017-06-27 10:22:58

标签: python pandas

最后一条语句返回:TypeError:unorderable类型:Interval()< INT()

j = pd.DataFrame({'a':[12,16,23,27,22,36,31,38], 'b':[np.nan, 23, 58, 
np.nan, np.nan, np.nan, 76, np.nan]})

bin = [0, 10, 20, 30, 40]

k = pd.cut(c.a, bin)

j['new'] = k

groupby = j.groupby('new').b.median()   #computation doesn't matter

dict = groupby.to_dict()

j['b'] = j['b'].fillna(j['new'].map(dict))

我使用简单的浮点数而不是间隔来尝试这个并且它工作正常

1 个答案:

答案 0 :(得分:3)

对我来说它很好用,可能需要最新版本的pandas 0.20.2

j = pd.DataFrame({'a':[12,16,23,27,22,36,31,38], 
                  'b':[np.nan, 23, 58, np.nan, np.nan, np.nan, 76, np.nan]})

bins = [0, 10, 20, 30, 40]
j['new'] = pd.cut(j.a, bins)
print (j)
    a     b       new
0  12   NaN  (10, 20]
1  16  23.0  (10, 20]
2  23  58.0  (20, 30]
3  27   NaN  (20, 30]
4  22   NaN  (20, 30]
5  36   NaN  (30, 40]
6  31  76.0  (30, 40]
7  38   NaN  (30, 40]

d = j.groupby('new').b.median().to_dict()
print (d)
{Interval(30, 40, closed='right'): 76.0, 
 Interval(0, 10, closed='right'): nan, 
 Interval(10, 20, closed='right'): 23.0, 
 Interval(20, 30, closed='right'): 58.0}

j['b'] = j['b'].fillna(j['new'].map(d))
print (j)
    a     b       new
0  12  23.0  (10, 20]
1  16  23.0  (10, 20]
2  23  58.0  (20, 30]
3  27  58.0  (20, 30]
4  22  58.0  (20, 30]
5  36  76.0  (30, 40]
6  31  76.0  (30, 40]
7  38  76.0  (30, 40]

更简单的解决方案:

j['b'] = j.groupby(pd.cut(j.a, bins))['b'].apply(lambda x: x.fillna(x.median()))
print (j)
    a     b
0  12  23.0
1  16  23.0
2  23  58.0
3  27  58.0
4  22  58.0
5  36  76.0
6  31  76.0
7  38  76.0