新R用户。我正在尝试根据十进制分割数据集,根据this question中的过程使用cut。我想将十进制值添加为数据帧中的新列,但是当我这样做时,由于某种原因,最低值被列为NA。无论include.lowest = TRUE还是FALSE,都会发生这种情况。任何人都知道为什么?
当我使用这个样本集时也会发生这种情况,所以它并不是我的数据所独有的。
数据&lt; - c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)< / p>
> decile <- cut(data, quantile(data, (0:10)/10, labels=TRUE, include.lowest=FALSE))
> df <- cbind(data, decile)
> df
data decile
[1,] 1 NA
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 3
[6,] 6 3
[7,] 7 4
[8,] 8 4
[9,] 9 5
[10,] 10 5
[11,] 11 6
[12,] 12 6
[13,] 13 7
[14,] 14 7
[15,] 15 8
[16,] 16 8
[17,] 17 9
[18,] 18 9
[19,] 19 10
[20,] 20 10
答案 0 :(得分:4)
有两个问题,首先你的cut
电话有一些问题。我想你的意思是
cut(data, quantile(data, (0:10)/10), include.lowest=FALSE)
## ^missing parenthesis
此外,labels
应为FALSE
,NULL
或包含所需标签的length(breaks)
向量。
其次,主要问题是因为您设置了include.lowest=FALSE
,和 data[1]
是1
,这对应于
> quantile(data, (0:10)/10)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1.0 2.9 4.8 6.7 8.6 10.5 12.4 14.3 16.2 18.1 20.0
值1
不属于任何类别;它超出了休息时间定义的类别的下限。
我不确定你想要什么,但你可以尝试这两种选择中的一种,具体取决于你想要1
所在的课程:
> cut(data, quantile(data, (0:10)/10), include.lowest=TRUE)
[1] [1,2.9] [1,2.9] (2.9,4.8] (2.9,4.8] (4.8,6.7] (4.8,6.7]
[7] (6.7,8.6] (6.7,8.6] (8.6,10.5] (8.6,10.5] (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20] (18.1,20]
10 Levels: [1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] (8.6,10.5] ... (18.1,20]
> cut(data, c(0, quantile(data, (0:10)/10)), include.lowest=FALSE)
[1] (0,1] (1,2.9] (2.9,4.8] (2.9,4.8] (4.8,6.7] (4.8,6.7]
[7] (6.7,8.6] (6.7,8.6] (8.6,10.5] (8.6,10.5] (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20] (18.1,20]
11 Levels: (0,1] (1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] ... (18.1,20]