我的问题与the other question的解决方案有关。
我想知道如何将bin大小从3改为5或10或者其他什么。如果我更改step
,那么这还不够。我也应该改变(str(int(cat[1:3])) + "-" + str(int(cat[5:7])-1)
,但这是我不能做的。我收到错误ValueError: invalid literal for int() with base 10: '18, '
。
step=3
kwargs = dict(include_lowest=True, right=False)
bins = pd.cut(df.AVG_PERCENT_EVAL_1, bins=np.arange(18,40+step,step), **kwargs)
labels = [(str(int(cat[1:3])) + "-" + str(int(cat[5:7])-1)) for cat in bins.cat.categories]
bins.cat.categories = labels
df = df.assign(AVG_PERCENT_RANGE=bins).drop("AVG_PERCENT_EVAL_1", axis=1)
df.groupby(['GROUP', 'AVG_PERCENT_RANGE'], as_index=False).agg('mean')
答案 0 :(得分:1)
这是你想要的吗?
In [166]: %paste
step=5
kwargs = dict(include_lowest=True, right=False)
bins=np.arange(18,40+step,step)
labels = ['{}-{}'.format(i, i+step-1) for i in bins][:-1]
df['AVG_PERCENT_RANGE'] = pd.cut(df.pop('AVG_PERCENT_EVAL_1'),
bins=bins, labels=labels, **kwargs)
df.groupby(['GROUP', 'AVG_PERCENT_RANGE'], as_index=False).agg('mean')
## -- End pasted text --
Out[166]:
GROUP AVG_PERCENT_RANGE AVG_PERCENT_NEGATIVE AVG_TOTAL_WAIT_TIME AVG_TOTAL_SERVICE_TIME
0 AAAAA 18-22 6.500000 85.682099 247.880659
1 AAAAA 23-27 0.833333 103.445112 314.336474
2 AAAAA 28-32 NaN NaN NaN
3 AAAAA 33-37 NaN NaN NaN
4 AAAAA 38-42 NaN NaN NaN
5 BBBBB 18-22 0.777778 63.500619 242.510146
6 BBBBB 23-27 2.000000 103.796290 313.685358
7 BBBBB 28-32 NaN NaN NaN
8 BBBBB 33-37 NaN NaN NaN
9 BBBBB 38-42 NaN NaN NaN