我需要在连续变量上应用cut
,以便在ggplot2中使用Brewer颜色标度显示它,就像在Setting breakpoints for data with scale_fill_brewer() function in ggplot2中一样。连续变量是相对差异,我想将数据格式化为“18.2%”而不是“0.182”。有没有一种简单的方法来实现这一目标?
x <- runif(100)
levels(cut(x, breaks=10))
[1] "(0.0223,0.12]" "(0.12,0.218]" "(0.218,0.315]" "(0.315,0.413]"
[5] "(0.413,0.511]" "(0.511,0.608]" "(0.608,0.706]" "(0.706,0.804]"
[9] "(0.804,0.901]" "(0.901,0.999]"
我想,例如,第一个级别显示为(2.23 %, 12 %]
。是否有更好的替代cut
?
答案 0 :(得分:17)
我已在kimisc
软件包的0.2-3版本中实现了cut_format()
,版本0.3现在已在CRAN上。
# devtools::install_github("krlmlr/kimisc")
x <- seq(0.1, 0.9, by = 0.2)
breaks <- seq(0, 1, by = 0.25)
cut(x, breaks)
## [1] (0,0.25] (0.25,0.5] (0.25,0.5] (0.5,0.75] (0.75,1]
## Levels: (0,0.25] (0.25,0.5] (0.5,0.75] (0.75,1]
cut_format(x, breaks, format_fun = scales::percent)
## [1] (0%, 25%] (25%, 50%] (25%, 50%] (50%, 75%] (75%, 100%]
## Levels: (0%, 25%] (25%, 50%] (50%, 75%] (75%, 100%]
它仍然不完美,传递休息次数(如原始示例中所示)还不起作用。
答案 1 :(得分:10)
在将原始数据乘以100之后,将gsub
与一些正则表达式一起使用
gsub("([0-9.]+)","\\1%",levels(cut(x*100,breaks=10)))
[1] "(0.449%,10.4%]" "(10.4%,20.3%]" "(20.3%,30.2%]" "(30.2%,40.2%]" "(40.2%,50.1%]" "(50.1%,60%]" "(60%,69.9%]" "(69.9%,79.9%]" "(79.9%,89.8%]" "(89.8%,99.7%]"
答案 2 :(得分:6)
为什么不复制cut.default
的代码并使用修改后的级别创建自己的版本?请参阅this gist。
改变了两行:
第22行:ch.br <- formatC(breaks, digits = dig, width = 1)
已更改为ch.br <- formatC(breaks*100, digits = dig, width = 1)
。
第29行:else "[", ch.br[-nb], ",", ch.br[-1L], if (right)
已更改为else "[", ch.br[-nb], "%, ", ch.br[-1L], "%", if (right)
其余的都是一样的。在这里,它正在发挥作用:
library(devtools)
source_gist(4593967)
set.seed(1)
x <- runif(100)
levels(cut2(x, breaks=10))
# [1] "(1.24%, 11%]" "(11%, 20.9%]" "(20.9%, 30.7%]" "(30.7%, 40.5%]" "(40.5%, 50.3%]"
# [6] "(50.3%, 60.1%]" "(60.1%, 69.9%]" "(69.9%, 79.7%]" "(79.7%, 89.5%]" "(89.5%, 99.3%]"
答案 3 :(得分:3)
旧问题的新答案。
您可以使用label
参数传递函数来格式化标签。我将使用gsubfn
和scales::percent
library(gsubfn)
library(scales)
pcut <- function(x) gsubfn('\\d\\.\\d+', function(x) percent(as.numeric(x)),xx)
d <- data.frame(x=runif(100))
ggplot(d,aes(x=x,y=seq_along(x))) +
geom_point(aes(colour = cut(x, breaks = 10))) +
scale_colour_brewer(name = 'x', palette = 'Spectral', label = pcut)
答案 4 :(得分:2)
我的软件包cutr
与@krlmlr的功能非常相似(到目前为止我还不知道)。
cutf
只是cut
,带有一个format_fun
参数,而...
则传递给format_fun
,而不是cut
,如{{ 1}}。
cut_format
具有更多功能和不同的默认值:
smart_cut
devtools::install_github("moodymudskipper/cutr")
library(cutr)
x <- seq(0.1, 0.9, by = 0.2)
breaks <- seq(0, 1, by = 0.25)
cutf(x, breaks, format_fun = scales::percent)
# [1] (0%,25%] (25%,50%] (25%,50%] (50%,75%] (75%,100%]
# Levels: (0%,25%] (25%,50%] (50%,75%] (75%,100%]
smart_cut(x, breaks, format_fun = scales::percent,simplify = F, closed = "right")
# [1] [0%,25%] (25%,50%] (25%,50%] (50%,75%] (75%,100%]
# Levels: [0%,25%] < (25%,50%] < (50%,75%] < (75%,100%]
现在还有一个Hmisc::cut2
参数:
formatfun
答案 5 :(得分:1)
新的{santoku} package现在提供了在开发版本中执行此操作的方法:
library(santoku)
set.seed(20200607)
x <- runif(20)
chop_evenly(x, 10, labels = lbl_intervals(fmt = percent))
#> [1] [33.13%, 42.11%) [60.08%, 69.06%) [69.06%, 78.04%) [69.06%, 78.04%)
#> [5] [87.02%, 96%] [6.193%, 15.17%) [15.17%, 24.15%) [6.193%, 15.17%)
#> [9] [33.13%, 42.11%) [6.193%, 15.17%) [87.02%, 96%] [51.1%, 60.08%)
#> [13] [42.11%, 51.1%) [6.193%, 15.17%) [42.11%, 51.1%) [6.193%, 15.17%)
#> [17] [6.193%, 15.17%) [69.06%, 78.04%) [78.04%, 87.02%) [87.02%, 96%]
#> 9 Levels: [6.193%, 15.17%) [15.17%, 24.15%) ... [87.02%, 96%]
tab_evenly(x, 10, labels = lbl_intervals(fmt = scales::label_percent(accuracy = 0.1)))
#> x
#> [6.2%, 15.2%) [15.2%, 24.2%) [33.1%, 42.1%) [42.1%, 51.1%) [51.1%, 60.1%)
#> 6 1 2 2 1
#> [60.1%, 69.1%) [69.1%, 78.0%) [78.0%, 87.0%) [87.0%, 96.0%]
#> 1 3 1 3
由reprex package(v0.3.0)于2020-06-09创建