我正在使用Pandas的qcut为机器学习算法准备好我的数据。我有价格的产品,我用这段代码将我的数据离散化为相同大小的桶:
df['PriceBucket'] = pd.qcut(df['sell_prix'].sort_values(), 10, labels=False)
此代码有关于我的标签的更多详细信息:
df['PriceBucketTitle'] = pd.qcut(df['sell_prix'].sort_values(), 10)
如下所示,我有PriceBucket和PriceBucketTitle,它很完美!现在,我想要考虑多少元素。此代码返回NaN值(如下所示):
df['products_by_number'] = pd.qcut(df['sell_prix'], 10, labels=False).value_counts()
我知道如果我通过PriceBucket做一个群体可能是可行的,但我想保留我的数据格式。 这是结果:
sell_prix PriceBucket PriceBucketTitle products_by_number
4668 8.0 2 (6.5, 8.5] NaN
4669 8.0 2 (6.5, 8.5] NaN
4670 8.0 2 (6.5, 8.5] NaN
4671 8.0 2 (6.5, 8.5] NaN
4672 8.0 2 (6.5, 8.5] NaN
4673 8.0 2 (6.5, 8.5] NaN
4674 8.0 2 (6.5, 8.5] NaN
4675 8.0 2 (6.5, 8.5] NaN
4676 8.0 2 (6.5, 8.5] NaN
4677 8.0 2 (6.5, 8.5] NaN
11902 15.0 5 (12.9, 15] NaN
11903 15.0 5 (12.9, 15] NaN
11904 15.0 5 (12.9, 15] NaN
11905 15.0 5 (12.9, 15] NaN
11906 15.0 5 (12.9, 15] NaN
11907 15.0 5 (12.9, 15] NaN
11908 15.0 5 (12.9, 15] NaN
11909 15.0 5 (12.9, 15] NaN
11910 15.0 5 (12.9, 15] NaN
11911 15.0 5 (12.9, 15] NaN
12065 11.0 4 (10, 12.9] NaN
12066 11.0 4 (10, 12.9] NaN
例如,这就是我想要的:
sell_prix PriceBucket PriceBucketTitle products_by_number
4668 8.0 2 (6.5, 8.5] 984546.0
4669 8.0 2 (6.5, 8.5] 984546.0
4670 8.0 2 (6.5, 8.5] 984546.0
4671 8.0 2 (6.5, 8.5] 984546.0
4672 8.0 2 (6.5, 8.5] 984546.0
4673 8.0 2 (6.5, 8.5] 984546.0
4674 8.0 2 (6.5, 8.5] 984546.0
4675 8.0 2 (6.5, 8.5] 984546.0
4676 8.0 2 (6.5, 8.5] 984546.0
4677 8.0 2 (6.5, 8.5] 984546.0
11902 15.0 5 (12.9, 15] 1028141.0
11903 15.0 5 (12.9, 15] 1028141.0
11904 15.0 5 (12.9, 15] 1028141.0
11905 15.0 5 (12.9, 15] 1028141.0
11906 15.0 5 (12.9, 15] 1028141.0
11907 15.0 5 (12.9, 15] 1028141.0
11908 15.0 5 (12.9, 15] 1028141.0
11909 15.0 5 (12.9, 15] 1028141.0
11910 15.0 5 (12.9, 15] 1028141.0
11911 15.0 5 (12.9, 15] 1028141.0
12065 11.0 4 (10, 12.9] 48998.0
12066 11.0 4 (10, 12.9] 48998.0
帮助? Thanx!