我有一个数据集,需要将我的数据集的年龄因子分为3个不同的年龄类别......例如。年龄组1(10-20岁),年龄组2(21-30岁)和年龄组3(31-40岁)。
如果我输入
breaks=c(10, 20, 30, 40)
创建剪切函数时,结果如下:
1岁组为10-20岁
2岁组为20-30岁
年龄组3为30-40
我不想要这个!我需要年龄组2到21-30岁(但现在20岁是这个年龄段的一部分)...我会感谢一些帮助,谢谢你
答案 0 :(得分:3)
我认为你误解了结果。间隔是半开放的。它们包括上限,但不包括下限。所以
age = sample(10:40, 50, replace=TRUE)
cut(age, breaks=c(10, 20, 30, 40))
[1] (30,40] (30,40] (30,40] (20,30] (30,40] (30,40] (30,40]
[8] (30,40] (10,20] (30,40] (20,30] (30,40] (30,40] (10,20]
[15] (10,20] (30,40] (30,40] (20,30] (30,40] (30,40] (20,30]
[22] (30,40] (30,40] (30,40] (10,20] (20,30] (10,20] (10,20]
[29] (10,20] (10,20] (20,30] (10,20] (20,30] (30,40] (20,30]
[36] (20,30] (20,30] (20,30] (10,20] (30,40] (20,30] (20,30]
[43] (10,20] (20,30] (20,30] (30,40] (30,40] (20,30] (10,20]
[50] (20,30]
Levels: (10,20] (20,30] (30,40]
表示数字20仅在第一组(10,20)
但不是在第二组(20,30)
另请注意,默认值不包括下限,因此比我之前写的cut(age, breaks=c(10, 20, 30, 40), include.lowest = TRUE)
更好,这将使最低级别成为完全关闭的区间[10,20]。