使用ggplot2的 stat_ecdf()函数,我制作了一个累积密度函数图。我需要在两个X轴值之间的CDF曲线下的阴影区域阴影并将其转换为可打印的输出。使用IRIS数据集,我使用以下代码复制了方案:
library(ggplot2)
iris <- datasets::iris
iris <- iris[order(iris$Sepal.Length),]
(plot_1 <- ggplot(iris, aes(Sepal.Length)) +
stat_ecdf() +
scale_x_reverse())
plot_1_plotly <- ggplotly(plot_1)
plot_1_plotly
(plot_2 <- ggplot(iris, aes(Sepal.Length)) +
stat_ecdf(aes(ymin = 0, ymax = ..y..), geom = "ribbon", alpha = 0.2,
fill = "blue") +
stat_ecdf(geom="step") +
scale_x_reverse())
plot_2_ggplotly <- ggplotly(plot_2)
plot_2_ggplotly
问题1:在plot_2输出中,如何在两个x轴值(例如x = 6和x = 7)之间限制阴影区域?
问题2:当我将plot_2转换为plotly输出(即plot_2_plotly)时,为什么阴影区域会变得混乱,如输出所示?如何恢复原始格式?
答案 0 :(得分:0)
我遇到了一个类似的问题,试图为指数生存函数着色CDF曲线的区域。使用geom_polygon
,我能够找到CDF线图的解决方案。
# creating poisson distribution with mean of 15 and cumulative count/ proportion
cumulative_frequencies <- data.frame(person_id=1:100,
num_active_days=rpois(10000, lambda=15)) %>%
group_by(num_active_days) %>% summarise(num_people = n()) %>%
arrange(num_active_days) %>%
mutate(cum_frequency=cumsum(num_people),
rel_cumfreq = cum_frequency/sum(num_people))
# create cdf curve
p <- ggplot(cumulative_frequencies, aes(x=num_active_days, y=rel_cumfreq)) +
geom_line() +
xlab("Time") +
ylab("Cumulative Density") + theme_classic()
p
然后使用geom_polygon
在曲线下的所需区域着色:
# minimum value of x for the area under the curve shading
x_start <- 15
x_end <- 20
#Subset the data and add the coordinates to make it shade to y = 0
shade <- rbind(c(x_start,0), subset(cumulative_frequencies, num_active_days >=
x_start & num_active_days <= x_end), c(x_end, 0))
# add shading to cdf curve
p + geom_polygon(data = shade, aes(num_active_days, rel_cumfreq))