将所有数据绘制为geom_point,并在ggplot2中包含显示均值的行; stat_summary

时间:2016-06-23 13:10:40

标签: r plot ggplot2

我喜欢用它们之间的线条绘制所有数据点,表示参与者。在这里,我有我的每个参与者'评级根据条件和刺激类型绘制:

WHAT I HAVE

我想要的是在每个条件的颜色中为每个刺激类型添加每个条件的平均线。理想情况下,这将是这样的:

WHAT I NEED

我已经尝试过使用stat_summary和stat_sum_df,详见ggplot2文档网站here,但我无法使用它。它要么不做任何事情,要么为每个参与者绘制线条。

我用来生成第一张图的代码如下:

ggplot(df, aes(x=StimulusType+jitterVal, y=Rating, group=ParticipantCondition)) +
  geom_point(size=4.5, aes(colour=Condition), alpha=0.3)+
  geom_line(size=1, alpha=0.05)+
  scale_y_continuous(limits=c(0, 7.5), breaks=seq(0,7,by=1))+ 
  scale_colour_manual(values=c("#0072B2",  "#009E73", "#F0E442", "#D55E00"))+
  xlab('Stimulus type') +
  scale_x_continuous(limits=(c(0.5, 2.5)), breaks = c(0.9, 1.9), labels = levels(df$StimulusType))+
  ylab('Mean Rating') +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  theme_bw()

...您可以为前4位参与者创建示例数据框,如下所示:

Participant <-  rep(c("01", "02", "03", "04"), 8)
StimulusType <- rep(rep(c(1, 2), each=4), 4)
Condition <- rep(c("A", "B", "C", "D"), each=8)
Rating <- c(5.20, 5.55, 3.10, 4.05, 5.05, 5.85, 3.90, 5.25, 4.70, 3.15, 3.40, 4.85, 4.90, 4.00, 3.95, 3.95, 3.00, 4.60, 3.95, 4.00, 3.15, 5.20,
5.05, 3.70, 2.75, 3.40, 4.80, 4.55, 2.35, 2.45, 5.45, 4.05)
jitterVal <-  c(-0.19459509, -0.19571169, -0.17475060, -0.19599276, -0.17536634, -0.19429345, -0.17363951, -0.17446702, -0.13601392,
-0.14484280, -0.12328058, -0.12427593, -0.12913823, -0.12042329, -0.14703381, -0.12603936, -0.09125372, -0.08213296,
-0.09140868, -0.09728309, -0.08377205, -0.08514802, -0.08715795, -0.08932001, -0.02689549, -0.04717990, -0.03918013,
-0.03068255, -0.02826789, -0.02345827, -0.03473678, -0.03369023)

df <- data.frame(Participant, StimulusType, Condition, Rating, jitterVal)
ParticipantCondition <- paste(df$Participant, df$Condition)

我认为问题可能在于我创建的分组变量ParticipantCondition,以便为每个条件的每个参与者获取点之间的界限。

非常感谢任何帮助。

3 个答案:

答案 0 :(得分:2)

我使用dplyr计算了外部的平均值。平均值由平方表示。你怎么看待这个?

library(dplyr)
library(ggplot2)
Participant <-  rep(c("01", "02", "03", "04"), 8)
StimulusType <- rep(rep(c(1, 2), each=4), 4)
Condition <- rep(c("A", "B", "C", "D"), each=8)
Rating <- c(5.20, 5.55, 3.10, 4.05, 5.05, 5.85, 3.90, 5.25, 4.70, 3.15, 3.40, 4.85, 4.90, 4.00, 3.95, 3.95, 3.00, 4.60, 3.95, 4.00, 3.15, 5.20,
            5.05, 3.70, 2.75, 3.40, 4.80, 4.55, 2.35, 2.45, 5.45, 4.05)
jitterVal <-  c(-0.19459509, -0.19571169, -0.17475060, -0.19599276, -0.17536634, -0.19429345, -0.17363951, -0.17446702, -0.13601392,
                -0.14484280, -0.12328058, -0.12427593, -0.12913823, -0.12042329, -0.14703381, -0.12603936, -0.09125372, -0.08213296,
                -0.09140868, -0.09728309, -0.08377205, -0.08514802, -0.08715795, -0.08932001, -0.02689549, -0.04717990, -0.03918013,
                -0.03068255, -0.02826789, -0.02345827, -0.03473678, -0.03369023)

df <- data.frame(Participant, StimulusType, Condition, Rating, jitterVal)
ParticipantCondition <- paste(df$Participant, df$Condition)
rm(Rating, StimulusType, Condition, jitterVal)

levels(df$Condition)

mean_values <- df %>% group_by(StimulusType ,Condition) %>% select(Rating, jitterVal) %>% summarise_each(funs(mean))
mean_values <- ungroup(mean_values)
levels(mean_values$Condition) <- levels(df$Condition)

ggplot(df, aes(y=Rating, x = StimulusType + jitterVal)) +
  geom_point(size=4.5, aes(colour = Condition), alpha=0.4) +
  geom_line(size=1, alpha=0.05, aes(group = ParticipantCondition)) + 
  geom_rect(data = mean_values, 
            aes( xmin = ((StimulusType + jitterVal) - 0.05), 
                 xmax = ((StimulusType + jitterVal) + 0.05), 
                 ymin = Rating - 0.05, 
                 ymax = Rating + 0.05,
                 fill = Condition)) +
  scale_y_continuous(limits=c(0, 7.5), breaks=seq(0,7,by=1))+ 
  scale_colour_manual(values=c("#0072B2",  "#009E73", "#F0E442", "#D55E00"))+
  scale_fill_manual(values=c("#0072B2",  "#009E73", "#F0E442", "#D55E00"))+
  xlab('Stimulus type') +
  scale_x_continuous(limits=(c(0.5, 2.5)), breaks = c(0.9, 1.9), labels = levels(df$StimulusType))+
  ylab('Mean Rating') +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  theme_bw()

矩形的大小当然可以很容易地调整。

enter image description here

答案 1 :(得分:2)

在开始避免分组问题之前,您可能需要生成摘要。一种选择是:

library(dplyr)
summaryData <-
  df %>%
  group_by(StimulusType, Condition) %>%
  summarise(meanRating = mean(Rating)
            , jitterVal = mean(jitterVal)) %>%
  mutate(xmin = StimulusType+jitterVal-0.04
         , xend = StimulusType+jitterVal+0.04)

ggplot(df, aes(x=StimulusType+jitterVal, y=Rating, group=ParticipantCondition)) +
  geom_point(size=4.5, aes(colour=Condition), alpha=0.3)+
  geom_line(size=1, alpha=0.05)+
  scale_y_continuous(limits=c(0, 7.5), breaks=seq(0,7,by=1))+ 
  scale_colour_manual(values=c("#0072B2",  "#009E73", "#F0E442", "#D55E00"))+
  xlab('Stimulus type') +
  scale_x_continuous(limits=(c(0.5, 2.5)), breaks = c(0.9, 1.9), labels = levels(df$StimulusType))+
  ylab('Mean Rating') +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  geom_segment(data = summaryData
               , mapping =  aes(x=xmin
                                , xend=xend
                                , y=meanRating
                                , yend =meanRating
                                , group = NA
                                , colour = Condition)
               , lwd = 3
               , show.legend = FALSE
  ) +
  theme_bw()

这给出了一个与你展示的情节非常相似的情节: enter image description here

答案 2 :(得分:0)

这是您无需首先汇总/汇总数据的解决方案。 相反,您可以使用原始数据集,并根据需要轻松添加单个数据点。使用ggplot的 stat_summary 选项计算平均值。

ggplot(df, aes(x=StimulusType, y = Rating, group=Condition, color=Condition)) + 
      # add individual lines + data points
      geom_line (aes(group=interaction(Condition,Participant)), linetype = "dashed", size=.5) +
      geom_point(size=.5) +
      # add mean lines + datapoints

      geom_line (stat="summary", fun.y="mean", size=1) +
      geom_point(stat="summary", fun.y="mean", size=2) +
      scale_colour_manual(values=c("#0072B2",  "#009E73", "#F0E442", "#D55E00"))