使用ggplot在多个数据系列上添加错误栏

时间:2017-06-05 03:05:21

标签: r ggplot2

我在5个疗程(包括对照)的几个月内记录了一些数据。我正在使用ggplot将数据绘制为时间序列,并为每个日期生成原始数据和标准错误均值的数据框。

我正在尝试在同一个图表上绘制所有五种治疗方法并用它显示误差条。我能够a)。绘制一个治疗组并显示误差条和b)。绘制所有五个处理但不显示误差条。

这是我的数据(我只提供了两种治疗方法来保持整洁)

       dates   c_mean_am  c_se_am    T1_mean_am  T1_se_am  
1 2017-01-31   284.135   27.43111     228.935     23.39037    
2 2017-02-09   226.944   13.08237     173.241     13.42946    
3 2017-02-23   281.135   15.89709     252.665     20.73417   
4 2017-03-14   265.655   15.29930     238.225     17.47501 
5 2017-04-06   312.785   13.08237     237.485     13.42946 
  • c_mean_am =控制手段
  • c_se_am =控件的标准错误
  • T1_mean_am =治疗1意味着
  • T1_se_am =治疗1的标准错误

这是我的代码,以实现上面的选项a)

ggplot(summary, aes(x=dates, y=c_mean_am),xlab="Date") + 
    geom_point(shape = 19, size = 2,color="blue") + 
    geom_line(color="blue") + 
    geom_errorbar(aes(x=dates, ymin=c_mean_am-c_se_am, ymax=c_mean_am+c_se_am), color="blue", width=0.25) 

here's the plot

以下是上面选项b)的代码

sp <- ggplot(summary,aes(dates,y = Cond,color=Treatment)) + 
    geom_line(aes(y = c_mean_am, color = "Control")) + 
    geom_line(aes(y = T1_mean_am, color = "T1")) + 
    geom_point(aes(y = c_mean_am, color = "Control")) + 
    geom_point(aes(y = T1_mean_am, color = "T1"))

sp2<- sp + 
    scale_color_manual(breaks = c("Control", "T1","T2"), values=c("blue", "yellow"))

sp2

here's the plot

如何使用与点和线相同的颜色获取第二个图上的误差线?

由于

AB

2 个答案:

答案 0 :(得分:1)

首先将数据转换为长格式:

df <- df %>% 
 gather(mean_type, mean_val, c_mean_am, T1_mean_am) %>% 
 gather(se_type, se_val, c_se_am, T1_se_am)


ggplot(df, aes(dates, mean_val, colour=mean_type)) + 
    geom_line() + 
    geom_point() + 
    geom_errorbar(aes(ymin=mean_val-se_val, ymax=mean_val+se_val))

enter image description here

编辑:tidyr操纵的解释

new.dat <- mtcars %>%  # taking mtcars as the starting data.frame
        select(gear, cyl, mpg, qsec) %>% 
          # equivalent to mtcars[, c("gear", "cyl", "mpg", "qsec")]; to simplify the example
        gather(key=type, value=val, gear, cyl) %>% 
          # convert the data into a long form with 64 rows, with new factor column "type" and numeric column "val". "gear" and "cyl" are removed while "mpg" and "qsec" remain

new.dat[c(1:3, 33:35),]

#     mpg  qsec type val
# 1  21.0 16.46 gear   4
# 2  21.0 17.02 gear   4
# 3  22.8 18.61 gear   4
# 33 21.0 16.46  cyl   6
# 34 21.0 17.02  cyl   6
# 35 22.8 18.61  cyl   4

使用长格式数据,您可以使用新的标识符表格(&#34;类型&#34;)进行绘图,例如

ggplot(new.dat, aes(val, mpg, fill=type)) + 
   geom_col(position="dodge")

enter image description here

长格式对于绘制不同的方面也很有用,例如

ggplot(new.dat, aes(val, mpg, colour=type)) + 
    geom_point() + 
    facet_wrap(~type) 

enter image description here

答案 1 :(得分:1)

接受的答案似乎包含数据# HTML Format I had pills. <COMPANY>Microsoft</COMPANY> and <PERSON>BillGates</PERSON>. <DRUG>Cortisone shot</DRUG> hurts. ed(又名 gather 中的 pivot_longer)的方式错误,它重复了每个点和误差条。误差线很明显,但如果您将 packageVersion("tidyr") >= 1.0.0 替换为 geom_point(),您将看到与两个误差线相对应的两个点。这给其他人带来了一些confusion,所以我想为后代提供一个更正的解决方案。

这是避免这种重复的另一种方法:

geom_jitter()

现在您可以绘制并获得预期的图表,每个时间点的每组只有一个误差条和一个点。

# load necessary packages
library(tidyverse)

# create data from question
df <-
  structure(
    list(
      dates = c(
        "2017-01-31",
        "2017-02-09",
        "2017-02-23",
        "2017-03-14",
        "2017-04-06"
      ),
      c_mean_am = c(284.135, 226.944,
                    281.135, 265.655, 312.785),
      c_se_am = c(27.43111, 13.08237, 15.89709,
                  15.2993, 13.08237),
      T1_mean_am = c(228.935, 173.241, 252.665,
                     238.225, 237.485),
      T1_se_am = c(23.39037, 13.42946, 20.73417,
                   17.47501, 13.42946)
    ),
    class = "data.frame",
    row.names = c("1",
                  "2", "3", "4", "5")
  )

# pivot df long and confirm that there's only one value per group per timepoint
df_long <- df %>%
  pivot_longer(
    cols = -dates,
    names_to = c("treatment_group", ".value"),
    names_pattern = "(.*)_(.*_am)"
  ) 

df_long

# # A tibble: 10 x 4
#    dates      treatment_group mean_am se_am
#    <chr>      <chr>             <dbl> <dbl>
#  1 2017-01-31 c                  284.  27.4
#  2 2017-01-31 T1                 229.  23.4
#  3 2017-02-09 c                  227.  13.1
#  4 2017-02-09 T1                 173.  13.4
#  5 2017-02-23 c                  281.  15.9
#  6 2017-02-23 T1                 253.  20.7
#  7 2017-03-14 c                  266.  15.3
#  8 2017-03-14 T1                 238.  17.5
#  9 2017-04-06 c                  313.  13.1
# 10 2017-04-06 T1                 237.  13.4

产生此图的原因:

corrected plot