Question

我想在R中以某种方式对数据框进行分类假设有一个如下数据框：

> data = sample(1:500, 5000, replace = TRUE)

为了对这个数据框进行分类，我正在制作这些类：

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500))
> table(data.cl)
data.cl
   (0,10]   (10,20]   (20,30]   (30,40]   (40,50] 
      102        80        87       113       117 
  (50,60]   (60,70]   (70,80]   (80,90]  (90,100] 
      101        89        95       106       104 
(100,200] (200,350] (350,480] (480,500] 
     1002      1492      1318       194

如果我想要0，我只需要添加include.lowest = TRUE：

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500),
+ include.lowest = TRUE)
    > table(data.cl)
data.cl
   [0,10]   (10,20]   (20,30]   (30,40]   (40,50] 
      102        80        87       113       117 
  (50,60]   (60,70]   (70,80]   (80,90]  (90,100] 
      101        89        95       106       104 
(100,200] (200,350] (350,480] (480,500] 
     1002      1492      1318       194

在此示例中，这并未显示任何差异，因为0根本没有出现在此数据框中。但如果它会，例如， 4次，106类中102而不是[0,10]元素：

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500),
+ include.lowest = TRUE)
    > table(data.cl)
data.cl
   [0,10]   (10,20]   (20,30]   (30,40]   (40,50] 
      106        80        87       113       117 
  (50,60]   (60,70]   (70,80]   (80,90]  (90,100] 
      101        89        95       106       104 
(100,200] (200,350] (350,480] (480,500] 
     1002      1492      1318       194

更改课程限制还有另一种选择。 cut()的默认选项为right = FALSE。如果您将其更改为right = TRUE，则会获得：

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500),
+ include.lowest = TRUE, right = FALSE)
> table(data.cl)
data.cl
   [0,10)   [10,20)   [20,30)   [30,40)   [40,50) 
       92        81        87       111       118 
  [50,60)   [60,70)   [70,80)   [80,90)  [90,100) 
      103        89        94       103       103 
[100,200) [200,350) [350,480) [480,500] 
     1003      1497      1320       199

include.lowest现在变为“include.highest”，代价是更改了班级限制，因此在某些班级中返回不同数量的班级成员，因为班级限制略有变化。
但是，如果我想拥有数据框

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500))
> table(data.cl)
data.cl
   (0,10]   (10,20]   (20,30]   (30,40]   (40,50] 
      102        80        87       113       117 
  (50,60]   (60,70]   (70,80]   (80,90]  (90,100] 
      101        89        95       106       104 
(100,200] (200,350] (350,480] (480,500) 
     1002      1492      1318       194

排除 500，我该怎么办？当然，可以说：“只需写data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 499))而不是data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500))，因为你正在处理整数。”
嗯，那是对的，但如果情况不是这样的话我将使用花车呢？如何排除500呢？

函数'cut'的上限间隔

0 个答案: