如何使用剪切功能创建中断而不重叠数字

时间:2016-12-23 16:39:21

标签: r statistics cut

我有一个数据集,需要将我的数据集的年龄因子分为3个不同的年龄类别......例如。年龄组1(10-20岁),年龄组2(21-30岁)和年龄组3(31-40岁)。

如果我输入 breaks=c(10, 20, 30, 40)创建剪切函数时,结果如下: 1岁组为10-20岁 2岁组为20-30岁 年龄组3为30-40

我不想要这个!我需要年龄组2到21-30岁(但现在20岁是这个年龄段的一部分)...我会感谢一些帮助,谢谢你

1 个答案:

答案 0 :(得分:3)

我认为你误解了结果。间隔是半开放的。它们包括上限,但不包括下限。所以

 age = sample(10:40, 50, replace=TRUE)
 cut(age, breaks=c(10, 20, 30, 40))
 [1] (30,40] (30,40] (30,40] (20,30] (30,40] (30,40] (30,40]
 [8] (30,40] (10,20] (30,40] (20,30] (30,40] (30,40] (10,20]
[15] (10,20] (30,40] (30,40] (20,30] (30,40] (30,40] (20,30]
[22] (30,40] (30,40] (30,40] (10,20] (20,30] (10,20] (10,20]
[29] (10,20] (10,20] (20,30] (10,20] (20,30] (30,40] (20,30]
[36] (20,30] (20,30] (20,30] (10,20] (30,40] (20,30] (20,30]
[43] (10,20] (20,30] (20,30] (30,40] (30,40] (20,30] (10,20]
[50] (20,30]
Levels: (10,20] (20,30] (30,40]

表示数字20仅在第一组(10,20) 但不是在第二组(20,30) 另请注意,默认值不包括下限,因此比我之前写的cut(age, breaks=c(10, 20, 30, 40), include.lowest = TRUE)更好,这将使最低级别成为完全关闭的区间[10,20]。