Question

新R用户。我正在尝试根据十进制分割数据集，根据this question中的过程使用cut。我想将十进制值添加为数据帧中的新列，但是当我这样做时，由于某种原因，最低值被列为NA。无论include.lowest = TRUE还是FALSE，都会发生这种情况。任何人都知道为什么？

当我使用这个样本集时也会发生这种情况，所以它并不是我的数据所独有的。

数据＆lt; - c（1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20）< / p>

> decile <- cut(data, quantile(data, (0:10)/10, labels=TRUE, include.lowest=FALSE))

> df <- cbind(data, decile)

> df

      data decile
 [1,]    1     NA
 [2,]    2      1
 [3,]    3      2
 [4,]    4      2
 [5,]    5      3
 [6,]    6      3
 [7,]    7      4
 [8,]    8      4
 [9,]    9      5
[10,]   10      5
[11,]   11      6
[12,]   12      6
[13,]   13      7
[14,]   14      7
[15,]   15      8
[16,]   16      8
[17,]   17      9
[18,]   18      9
[19,]   19     10
[20,]   20     10

Answer 1

有两个问题，首先你的cut电话有一些问题。我想你的意思是

cut(data, quantile(data, (0:10)/10), include.lowest=FALSE)
##                                ^missing parenthesis

此外，labels应为FALSE，NULL或包含所需标签的length(breaks)向量。

其次，主要问题是因为您设置了include.lowest=FALSE，和 data[1]是1，这对应于

> quantile(data, (0:10)/10)
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
 1.0  2.9  4.8  6.7  8.6 10.5 12.4 14.3 16.2 18.1 20.0

值1不属于任何类别;它超出了休息时间定义的类别的下限。

我不确定你想要什么，但你可以尝试这两种选择中的一种，具体取决于你想要1所在的课程：

> cut(data, quantile(data, (0:10)/10), include.lowest=TRUE)
 [1] [1,2.9]     [1,2.9]     (2.9,4.8]   (2.9,4.8]   (4.8,6.7]   (4.8,6.7]  
 [7] (6.7,8.6]   (6.7,8.6]   (8.6,10.5]  (8.6,10.5]  (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20]   (18.1,20]  
10 Levels: [1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] (8.6,10.5] ... (18.1,20]
> cut(data, c(0, quantile(data, (0:10)/10)), include.lowest=FALSE)
 [1] (0,1]       (1,2.9]     (2.9,4.8]   (2.9,4.8]   (4.8,6.7]   (4.8,6.7]  
 [7] (6.7,8.6]   (6.7,8.6]   (8.6,10.5]  (8.6,10.5]  (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20]   (18.1,20]  
11 Levels: (0,1] (1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] ... (18.1,20]

使用cut（）添加十分位列时接收NA

1 个答案: