最后一条语句返回:TypeError:unorderable类型:Interval()< INT()
j = pd.DataFrame({'a':[12,16,23,27,22,36,31,38], 'b':[np.nan, 23, 58,
np.nan, np.nan, np.nan, 76, np.nan]})
bin = [0, 10, 20, 30, 40]
k = pd.cut(c.a, bin)
j['new'] = k
groupby = j.groupby('new').b.median() #computation doesn't matter
dict = groupby.to_dict()
j['b'] = j['b'].fillna(j['new'].map(dict))
我使用简单的浮点数而不是间隔来尝试这个并且它工作正常
答案 0 :(得分:3)
对我来说它很好用,可能需要最新版本的pandas 0.20.2
:
j = pd.DataFrame({'a':[12,16,23,27,22,36,31,38],
'b':[np.nan, 23, 58, np.nan, np.nan, np.nan, 76, np.nan]})
bins = [0, 10, 20, 30, 40]
j['new'] = pd.cut(j.a, bins)
print (j)
a b new
0 12 NaN (10, 20]
1 16 23.0 (10, 20]
2 23 58.0 (20, 30]
3 27 NaN (20, 30]
4 22 NaN (20, 30]
5 36 NaN (30, 40]
6 31 76.0 (30, 40]
7 38 NaN (30, 40]
d = j.groupby('new').b.median().to_dict()
print (d)
{Interval(30, 40, closed='right'): 76.0,
Interval(0, 10, closed='right'): nan,
Interval(10, 20, closed='right'): 23.0,
Interval(20, 30, closed='right'): 58.0}
j['b'] = j['b'].fillna(j['new'].map(d))
print (j)
a b new
0 12 23.0 (10, 20]
1 16 23.0 (10, 20]
2 23 58.0 (20, 30]
3 27 58.0 (20, 30]
4 22 58.0 (20, 30]
5 36 76.0 (30, 40]
6 31 76.0 (30, 40]
7 38 76.0 (30, 40]
更简单的解决方案:
j['b'] = j.groupby(pd.cut(j.a, bins))['b'].apply(lambda x: x.fillna(x.median()))
print (j)
a b
0 12 23.0
1 16 23.0
2 23 58.0
3 27 58.0
4 22 58.0
5 36 76.0
6 31 76.0
7 38 76.0