平滑分组比例图

时间:2018-03-27 17:05:48

标签: r ggplot2 smoothing

我有以下数据集:

set.seed(10)
start_date <- as.Date('2000-01-01')  
end_date <- as.Date('2000-01-10')   


Data <- data.frame(
  id = rep((1:1000),10), 
  group = rep(c("A","B"), 25),
  x = sample(1:100),
  y = sample(c("1", "0"), 10, replace = TRUE),
  date = as.Date(
       sample(as.numeric(start_date):
              as.numeric(end_date), 1000,
              replace = T), origin = '2000-01-01'))

有了这个,我创建了以下情节:

Data %>% mutate(treated = factor(group)) %>%
  mutate(date = as.POSIXct(date)) %>% #convert date to date
  group_by(treated, date) %>% #group
  summarise(prop = sum(y=="1")/n()) %>% #calculate proportion 
  ggplot()+ theme_classic() + 
  geom_line(aes(x = date, y = prop, color = treated)) +
  geom_point(aes(x = date, y = prop, color = treated)) +
  geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)

不幸的是情节非常糟糕&#39;我想顺利一点。我试过了geom_smooth(),但无法让它发挥作用。关于平滑的其他问题对我没有帮助,因为他们错过了分组方面,因此具有不同的结构。但是,示例数据集实际上是较大数据集的一部分,因此我需要坚持使用该代码。

[修改:我尝试的geom_smooth()代码为geom_smooth(method = 'auto', formula = y ~ x)]

有人能指出我正确的方向吗? 非常感谢,一切顺利。

1 个答案:

答案 0 :(得分:1)

这是你想要的平滑线吗?您使用美学方式调用geom_smooth,而不是与geom_line结合使用。您可以选择不同的平滑方法,但观察值较低的默认loess通常是人们想要的。顺便说一句,我不认为这比geom_line版本更好看,实际上可读性稍差。 geom_smooth最适用于每yx个观察结果,这使得模式很难看到,geom_line适用于1-1。

编辑:在仔细研究了你正在做的事情后,我添加了第二个不直接计算治疗日期的图,只是直接使用geom_smooth。这样可以获得更合理的置信区间,而不必像以前那样将其删除。

set.seed(10)
start_date <- as.Date('2000-01-01')  
end_date <- as.Date('2000-01-10')   


Data <- data.frame(
  id = rep((1:1000),10), 
  group = rep(c("A","B"), 25),
  x = sample(1:100),
  y = sample(c("1", "0"), 10, replace = TRUE),
  date = as.Date(
    sample(as.numeric(start_date):
             as.numeric(end_date), 1000,
           replace = T), origin = '2000-01-01'))

library(tidyverse)
Data %>%
  mutate(treated = factor(group)) %>%
  mutate(date = as.POSIXct(date)) %>% #convert date to date
  group_by(treated, date) %>% #group
  summarise(prop = sum(y=="1")/n()) %>% #calculate proportion 
  ggplot() +
  theme_classic() + 
  geom_smooth(aes(x = date, y = prop, color = treated), se = F) +
  geom_point(aes(x = date, y = prop, color = treated)) +
  geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Data %>%
  mutate(treated = factor(group)) %>%
  mutate(y = ifelse(y == "0", 0, 1)) %>% 
  mutate(date = as.POSIXct(date)) %>% #convert date to date
  ggplot() +
  theme_classic() +
  geom_smooth(aes(x = date, y = y, color = treated), method = "loess") +
  geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)

reprex package(v0.2.0)创建于2018-03-27。