使用cut对R中的连续变量进行分类,但元素属于错误的类别

时间:2015-05-18 07:31:18

标签: r cut discretization

我是R的新手,我试图将连续变量分为两类。假设如下:

y = c(6.3, 6.2, 6.2, 5.5, 6.9, 6.8, 5.3, 5.3, 5.4, 5.2, 7.2, 7.1, 8.1, 8.2, 8.2, 7.4, 6.7, 7.2, 7.9, 8.0, 6.5, 6.6, 6.5, 7.2, 7.2, 6.8, 6.7)
cuts = cut(y, breaks=2)
cuts
[1] (5.197,6.7] (5.197,6.7] (5.197,6.7] (5.197,6.7] (6.7,8.203] (6.7,8.203] (5.197,6.7] (5.197,6.7] (5.197,6.7] (5.197,6.7] (6.7,8.203]
[12] (6.7,8.203] (6.7,8.203] (6.7,8.203] (6.7,8.203] (6.7,8.203] (6.7,8.203] (6.7,8.203] (6.7,8.203] (6.7,8.203] (5.197,6.7] (5.197,6.7]
[23] (5.197,6.7] (6.7,8.203] (6.7,8.203] (6.7,8.203] (6.7,8.203]
Levels: (5.197,6.7] (6.7,8.203]

我对向量末尾出现的值 6.7 特别感兴趣。为什么6.7属于区间(6.7,8.203)而不属于(5.197,6.7)?据我所知6.7不应该是区间的一部分(6.7,8.203)。我错过了什么吗?感谢您的帮助!

修改

正如评论中指出的那样 6.7 实际上是 6.7000000000000001776

options(digits=20);
y
 [1] 6.2999999999999998224 6.2000000000000001776 6.2000000000000001776 5.5000000000000000000 6.9000000000000003553 6.7999999999999998224
 [7] 5.2999999999999998224 5.2999999999999998224 5.4000000000000003553 5.2000000000000001776 7.2000000000000001776 7.0999999999999996447
[13] 8.0999999999999996447 8.1999999999999992895 8.1999999999999992895 7.4000000000000003553 6.7000000000000001776 7.2000000000000001776
[19] 7.9000000000000003553 8.0000000000000000000 6.5000000000000000000 6.5999999999999996447 6.5000000000000000000 7.2000000000000001776
[25] 7.2000000000000001776 6.7999999999999998224 6.7000000000000001776

另一个问题:

我将保存间隔范围供以后参考,因为我想检查新元素落入哪个区间。所以想象我有切割产生的间隔(5.197,6.7] (6.7,8.203],现在我将得到一个新元素x = 6.7,我想检查它将落入哪个区间。当我检查5.197 < x <= 6.7是否会落入第一个区间,而我的原始6.7从向量落入第二个区间。

这里cuts = cut(y, breaks=2, dig.lab=17)真的可以让两个元素进入同一个区间吗?

0 个答案:

没有答案