我有一个数字向量,我想转换为五个数字级别。 我可以使用cut
获得五个级别dx <- data.frame(x=1:100)
dx$cut <- cut(dx$x,5)
但我现在在提取关卡的下边界和上边界时遇到问题。
所以举个例子
(0.901,20.8)dx$min
为0.901,dx$max
为20.8。
我试过了:
dx$min <- pmin(dx$cut)
dx$max <- pmax(dx$cut)
dx
但这不起作用。
答案 0 :(得分:5)
你可以尝试按照逗号分割标签(事先转换为character
并修改以抑制除,
和.
之外的标点符号),然后创建2列:
min_max <- unlist(strsplit(gsub("(?![,.])[[:punct:]]", "", as.character(dx$cut), perl=TRUE), ",")) # here, the regex ask to replace every punctuation mark except a . or a , by an empty string
dx$min <- min_max[seq(1, length(min_max), by=2)]
dx$max <- min_max[seq(2, length(min_max), by=2)]
head(dx)
# x cut min max
#1 1 (0.901,20.8] 0.901 20.8
#2 2 (0.901,20.8] 0.901 20.8
#3 3 (0.901,20.8] 0.901 20.8
#4 4 (0.901,20.8] 0.901 20.8
#5 5 (0.901,20.8] 0.901 20.8
#6 6 (0.901,20.8] 0.901 20.8
答案 1 :(得分:0)
下面是tidyverse样式的解决方案。
library(tidyverse)
tibble(x = seq(-1000, 1000, length.out = 10),
x_cut = cut(x, 5)) %>%
mutate(x_tmp = str_sub(x_cut, 2, -2)) %>%
separate(x_tmp, c("min", "max"), sep = ",") %>%
mutate_at(c("min", "max"), as.double)
#> # A tibble: 10 x 4
#> x x_cut min max
#> <dbl> <fct> <dbl> <dbl>
#> 1 -1000 (-1e+03,-600] -1000 -600
#> 2 -778. (-1e+03,-600] -1000 -600
#> 3 -556. (-600,-200] -600 -200
#> 4 -333. (-600,-200] -600 -200
#> 5 -111. (-200,200] -200 200
#> 6 111. (-200,200] -200 200
#> 7 333. (200,600] 200 600
#> 8 556. (200,600] 200 600
#> 9 778. (600,1e+03] 600 1000
#> 10 1000 (600,1e+03] 600 1000
由reprex package(v0.2.1)于2019-01-10创建