cut忽略区间R的上限

时间:2018-04-16 18:13:31

标签: r cut categorical-data

我试图剪切一个矢量来创建一个分类变量。尽管将right设置为TRUE,但当变量值等于间隔的上限时,输出显示NA

我已经用我的一小部分数据重现了下面的问题:

> var1
 [1]  74.11667  75.46667  77.06111  78.68333  80.33333  81.88889  83.16667  84.16667  85.21111
[10]  86.78333  88.28333  89.70000  91.11667  92.53333  93.89444  95.11667  96.20000  97.20000
[19]  98.20000  99.20000  99.93333 100.00000 100.00000  99.98125  99.92083


> dput(var1)
c(74.1166666666667, 75.4666666666667, 77.0611111111111, 78.6833333333334, 
80.3333333333333, 81.8888888888889, 83.1666666666667, 84.1666666666667, 
85.2111111111111, 86.7833333333333, 88.2833333333333, 89.7, 91.1166666666667, 
92.5333333333333, 93.8944444444445, 95.1166666666667, 96.2, 97.2, 
98.2, 99.2, 99.9333333333333, 100, 100, 99.98125, 99.9208333333333
)

> cut(x = var1, breaks = c(0, seq(from = 70, to = 100, by = 5)), right = T)
 [1] (70,75]  (75,80]  (75,80]  (75,80]  (80,85]  (80,85]  (80,85]  (80,85]  (85,90]  (85,90] 
[11] (85,90]  (85,90]  (90,95]  (90,95]  (90,95]  (95,100] (95,100] (95,100] (95,100] (95,100]
[21] (95,100] <NA>     <NA>     (95,100] (95,100]
Levels: (0,70] (70,75] (75,80] (80,85] (85,90] (90,95] (95,100]

级别(默认命名),清楚地显示100包含在最后一个间隔中,但当var1等于100时,我在输出中得到一个NA。

我在这里遗漏了什么吗?

编辑

使用基础R中的cut而不是任何包。这是会议信息:

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2      plotly_4.7.1      data.table_1.10.5 ggplot2_2.2.1     lubridate_1.7.2  
[6] gmad_0.0.0.9000  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      pillar_1.2.1      compiler_3.4.4    plyr_1.8.4        bindr_0.1        
 [6] bitops_1.0-6      tools_3.4.4       digest_0.6.15     jsonlite_1.5      tibble_1.4.2     
[11] gtable_0.2.0      viridisLite_0.3.0 pkgconfig_2.0.1   rlang_0.2.0       shiny_1.0.5      
[16] crosstalk_1.0.0   curl_3.1          yaml_2.1.17       dplyr_0.7.4       httr_1.3.1       
[21] stringr_1.3.0     htmlwidgets_1.0   caTools_1.17.1    grid_3.4.4        glue_1.2.0       
[26] R6_2.2.2          purrr_0.2.4       tidyr_0.8.0       magrittr_1.5      svMisc_1.0-2     
[31] scales_0.5.0      htmltools_0.3.6   assertthat_0.2.0  xtable_1.8-2      mime_0.5         
[36] colorspace_1.3-2  httpuv_1.3.6.2    labeling_0.3      stringi_1.1.6     lazyeval_0.2.1   
[41] munsell_0.4.3    

有效数字:

> getOption("digits")
[1] 7

编辑2

breaks参数中指定整数:

> cut(var1, breaks = c(0L, seq(from = 70L, to = 100L, by = 5L)), right = T)
 [1] (70,75]  (75,80]  (75,80]  (75,80]  (80,85]  (80,85]  (80,85]  (80,85]  (85,90]  (85,90] 
[11] (85,90]  (85,90]  (90,95]  (90,95]  (90,95]  (95,100] (95,100] (95,100] (95,100] (95,100]
[21] (95,100] <NA>     <NA>     (95,100] (95,100]
Levels: (0,70] (70,75] (75,80] (80,85] (85,90] (90,95] (95,100]

运行第二个ifelsevar1是从已使用data.table条件的ifelse中提取的:

> cut(ifelse(var1 > 100, 100, var1), breaks = c(0L, seq(from = 70L, to = 100L, by = 5L)), right = T)
 [1] (70,75]  (75,80]  (75,80]  (75,80]  (80,85]  (80,85]  (80,85]  (80,85]  (85,90]  (85,90] 
[11] (85,90]  (85,90]  (90,95]  (90,95]  (90,95]  (95,100] (95,100] (95,100] (95,100] (95,100]
[21] (95,100] (95,100] (95,100] (95,100] (95,100]
Levels: (0,70] (70,75] (75,80] (80,85] (85,90] (90,95] (95,100]

0 个答案:

没有答案