Question

这是一个简单的数据示例样本：

sample
Out[2]: 
0    0.047515
1    0.026392
2    0.024652
3    0.022854
4    0.020397
5    0.000087
6    0.000087
7    0.000078
8    0.000078
9    0.000078

下限值是0.000078，最大值是0.047515。当我在其上使用qcut函数时，结果为我的类别提供了负面数据。

pd.qcut(sample, 4)
Out[31]: 
0         (0.0242, 0.0475]
1         (0.0242, 0.0475]
2         (0.0242, 0.0475]
3         (0.0102, 0.0242]
4         (0.0102, 0.0242]
5       (8.02e-05, 0.0102]
6       (8.02e-05, 0.0102]
7    (-0.000922, 8.02e-05]
8    (-0.000922, 8.02e-05]
9    (-0.000922, 8.02e-05]
Name: data, dtype: category
Categories (4, interval[float64]): [(-0.000922, 8.02e-05] < (8.02e-05, 0.0102] < (0.0102, 0.0242] < (0.0242, 0.0475]]

这是预期的行为吗？我以为我会发现我的最小值和最大值是类别的上下限。

（我使用pandas 0.22.0和python-2.7）

Answer 1

之所以发生这种情况，是因为合并过程从您范围内的最小值中减去了0.001。如果bin的边缘==您序列中的确切数字，则不清楚该数字应放入哪个bin中。因此，在创建qtile之前稍微调整最小值和最大值是很有意义的。

有关pd.cut的信息，请参见源代码中的第210-213行。 https://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/reshape/tile.py#L210-L213

0.000078 -.001
Out[21]: -0.0009220000000000001

pd.qcut返回负值

1 个答案: