Question

我有一个数据框和具有相同行数的系列。

pd.cut的结果还输出具有相同形状的数据。

我要去哪里错了？

我的数据框，共有37459行：

df.shape

(37459, 124)

我要剪切的列37459行：

df['score']

2        74.390244
4        29.268293
5        45.121951
6        46.341463
7        31.707317
           ...    
43502    21.951220
43503     1.219512
43505     3.658537
43506     8.536585
43507    12.195122
Name: score, Length: 37459, dtype: float64

以及pd.cut的输出：

pd.cut(df['score'], [0, 33, 66, 100], labels=[1,2,3], retbins=True, right=False)

(2        3
 4        1
 5        2
 6        2
 7        1
         ..
 43502    1
 43503    1
 43505    1
 43506    1
 43507    1
 Name: score, Length: 37459, dtype: category
 Categories (3, int64): [1 < 2 < 3], array([  0,  33,  66, 100]))

我尝试将pd.cut的结果附加到df。我正在尝试将其分为三组并标记为[1,2,3]：

df['score_cut'] = pd.cut(df['score'], [0, 33, 66, 100], labels=[1,2,3], retbins=True, right=False)


ValueError: Length of values does not match length of index

我要去哪里错了？

Answer 1

您尝试过qcut吗？

pd.qcut(df['score'], [0, .33, .66, 1], labels=[1,2,3], retbins=True, right=False)

Answer 2

因为 df['score_cut'] 的 this 的形状不等于右边。

Answer 3

retbins=True 使 pd.cut() 返回一个元组。（请参阅doc。）

df['score_cut'], bins = pd.cut(df['score'], [0, 33, 66, 100], labels=[1,2,3], retbins=True, right=False)

应该可以

熊猫pd.cut ValueError：值的长度与索引的长度不匹配

3 个答案: