我试图剪切一个矢量来创建一个分类变量。尽管将right
设置为TRUE
,但当变量值等于间隔的上限时,输出显示NA
。
我已经用我的一小部分数据重现了下面的问题:
> var1
[1] 74.11667 75.46667 77.06111 78.68333 80.33333 81.88889 83.16667 84.16667 85.21111
[10] 86.78333 88.28333 89.70000 91.11667 92.53333 93.89444 95.11667 96.20000 97.20000
[19] 98.20000 99.20000 99.93333 100.00000 100.00000 99.98125 99.92083
> dput(var1)
c(74.1166666666667, 75.4666666666667, 77.0611111111111, 78.6833333333334,
80.3333333333333, 81.8888888888889, 83.1666666666667, 84.1666666666667,
85.2111111111111, 86.7833333333333, 88.2833333333333, 89.7, 91.1166666666667,
92.5333333333333, 93.8944444444445, 95.1166666666667, 96.2, 97.2,
98.2, 99.2, 99.9333333333333, 100, 100, 99.98125, 99.9208333333333
)
> cut(x = var1, breaks = c(0, seq(from = 70, to = 100, by = 5)), right = T)
[1] (70,75] (75,80] (75,80] (75,80] (80,85] (80,85] (80,85] (80,85] (85,90] (85,90]
[11] (85,90] (85,90] (90,95] (90,95] (90,95] (95,100] (95,100] (95,100] (95,100] (95,100]
[21] (95,100] <NA> <NA> (95,100] (95,100]
Levels: (0,70] (70,75] (75,80] (80,85] (85,90] (90,95] (95,100]
级别(默认命名),清楚地显示100包含在最后一个间隔中,但当var1
等于100时,我在输出中得到一个NA。
我在这里遗漏了什么吗?
编辑
使用基础R中的cut
而不是任何包。这是会议信息:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 plotly_4.7.1 data.table_1.10.5 ggplot2_2.2.1 lubridate_1.7.2
[6] gmad_0.0.0.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 pillar_1.2.1 compiler_3.4.4 plyr_1.8.4 bindr_0.1
[6] bitops_1.0-6 tools_3.4.4 digest_0.6.15 jsonlite_1.5 tibble_1.4.2
[11] gtable_0.2.0 viridisLite_0.3.0 pkgconfig_2.0.1 rlang_0.2.0 shiny_1.0.5
[16] crosstalk_1.0.0 curl_3.1 yaml_2.1.17 dplyr_0.7.4 httr_1.3.1
[21] stringr_1.3.0 htmlwidgets_1.0 caTools_1.17.1 grid_3.4.4 glue_1.2.0
[26] R6_2.2.2 purrr_0.2.4 tidyr_0.8.0 magrittr_1.5 svMisc_1.0-2
[31] scales_0.5.0 htmltools_0.3.6 assertthat_0.2.0 xtable_1.8-2 mime_0.5
[36] colorspace_1.3-2 httpuv_1.3.6.2 labeling_0.3 stringi_1.1.6 lazyeval_0.2.1
[41] munsell_0.4.3
有效数字:
> getOption("digits")
[1] 7
编辑2
在breaks
参数中指定整数:
> cut(var1, breaks = c(0L, seq(from = 70L, to = 100L, by = 5L)), right = T)
[1] (70,75] (75,80] (75,80] (75,80] (80,85] (80,85] (80,85] (80,85] (85,90] (85,90]
[11] (85,90] (85,90] (90,95] (90,95] (90,95] (95,100] (95,100] (95,100] (95,100] (95,100]
[21] (95,100] <NA> <NA> (95,100] (95,100]
Levels: (0,70] (70,75] (75,80] (80,85] (85,90] (90,95] (95,100]
运行第二个ifelse
(var1
是从已使用data.table
条件的ifelse
中提取的:
> cut(ifelse(var1 > 100, 100, var1), breaks = c(0L, seq(from = 70L, to = 100L, by = 5L)), right = T)
[1] (70,75] (75,80] (75,80] (75,80] (80,85] (80,85] (80,85] (80,85] (85,90] (85,90]
[11] (85,90] (85,90] (90,95] (90,95] (90,95] (95,100] (95,100] (95,100] (95,100] (95,100]
[21] (95,100] (95,100] (95,100] (95,100] (95,100]
Levels: (0,70] (70,75] (75,80] (80,85] (85,90] (90,95] (95,100]