在r中的分组boxplot上显示p值

时间:2017-03-15 15:04:50

标签: r ggplot2 boxplot p-value

我想在我的数据上方显示p值(可能带有弧线)。我的数据如下:

ID  Blog    Region  Dimension   Score
1   Blog1   PK  Info. vs. P. Focus  -4.75
2   Blog1   PK  Info. vs. P. Focus  -5.69
3   Blog1   PK  Info. vs. P. Focus  -0.27
4   Blog1   PK  Info. vs. P. Focus  -2.76
5   Blog1   PK  Info. vs. P. Focus  -8.24
6   Blog1   PK  Addressee Focus -12.51
7   Blog1   PK  Addressee Focus -1.28
8   Blog1   PK  Addressee Focus 0.95
9   Blog1   PK  Addressee Focus -5.96
10  Blog1   PK  Addressee Focus -8.81
11  Blog1   PK  Thematic Variation  -8.46
12  Blog1   PK  Thematic Variation  -6.15
13  Blog1   PK  Thematic Variation  -13.98
14  Blog1   PK  Thematic Variation  -16.43
15  Blog1   PK  Narrative Style -4.09
16  Blog1   PK  Narrative Style -11.06
17  Blog1   PK  Narrative Style -9.04
18  Blog1   PK  Narrative Style -8.56
19  Blog1   PK  Narrative Style -8.13
20  Blog1   PK  Narrative Style -14.46
21  Blog1   PK  Info. vs. P. Focus  -4.21
22  Blog1   PK  Info. vs. P. Focus  -4.96
23  Blog1   PK  Info. vs. P. Focus  -5.48
24  Blog1   PK  Info. vs. P. Focus  -4.53
25  Blog1   PK  Info. vs. P. Focus  6.31
26  Blog1   PK  Addressee Focus -11.16
27  Blog1   PK  Addressee Focus -1.27
28  Blog1   PK  Addressee Focus -11.49
29  Blog1   PK  Addressee Focus -0.9
30  Blog1   PK  Addressee Focus -12.27
31  Blog1   PK  Thematic Variation  6.85
32  Blog1   PK  Thematic Variation  -5.21
33  Blog1   PK  Thematic Variation  -1.06
34  Blog1   PK  Thematic Variation  -2.6
35  Blog1   PK  Narrative Style -0.95
36  Blog1   PK  Narrative Style -0.82
37  Blog1   PK  Narrative Style -7.65
38  Blog1   PK  Narrative Style 0.64
39  Blog1   PK  Narrative Style -2.25
40  Blog1   PK  Narrative Style -1.58
41  Blog1   PK  Info. vs. P. Focus  -5.73
42  Blog1   PK  Info. vs. P. Focus  0.37
43  Blog1   PK  Info. vs. P. Focus  -5.46
44  Blog1   PK  Info. vs. P. Focus  -3.48
45  Blog1   PK  Info. vs. P. Focus  0.88
46  Blog1   PK  Addressee Focus -2.11
47  Blog1   PK  Addressee Focus -10.13
48  Blog1   PK  Addressee Focus -2.08
49  Blog1   PK  Addressee Focus -4.33
50  Blog1   PK  Addressee Focus 1.09
51  Blog1   US  Thematic Variation  -4.23
52  Blog1   US  Thematic Variation  -1.46
53  Blog1   US  Thematic Variation  9.37
54  Blog1   US  Thematic Variation  5.84
55  Blog1   US  Narrative Style 8.21
56  Blog1   US  Narrative Style 7.34
57  Blog1   US  Narrative Style 1.83
58  Blog1   US  Narrative Style 14.39
59  Blog1   US  Narrative Style 22.02
60  Blog1   US  Narrative Style 4.83

代码如下:

get_wraper <- function(width) {
  function(x) {
    lapply(strwrap(x, width = width, simplify = FALSE), paste, collapse="\n")
  }
}
plotgraph <- function(x, y, colour, min, max, incr, p_values)
{
  plot1 <- ggplot(dims_Blog, aes_string(x = x, y = y, fill = colour)) +
    geom_boxplot()+
    labs(color=colour) +
    labs(x="Dimensions", y="Score") +
    scale_fill_grey(start = 0.3, end = 0.6) +
    theme_grey()+
    theme(legend.justification = c(1, 1), legend.position = c(1, 1)) +
    scale_x_discrete(labels = get_wraper(10))+
    scale_y_continuous(breaks=c(seq(min,max,incr)), limits = c(min, max))+
    theme(panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank())+
    geom_text(data = dims_Blog %>% group_by_(x, colour) %>% summarise_(mean=paste("mean(",y,", na.rm=TRUE)")), aes_string(x=x, y="mean", label="round(mean,3)"), position=position_dodge(width=0.8), size = 3, vjust = -0.5, colour="white")+
    geom_text(data = p_values, aes_string(x="Dimension", y="height", label="val"))
  return(plot1)
}

情节图

plot1 <- plotgraph("Dimension", "Blog1", "Region", -30, 50, 10, p_val1)
plot1

p值的数据框

Dimensions <- c("Info. vs. P. Focus", "Addressee Focus", "Thematic Variation", "Narrative Style")
val <- c("0.184", "0.079", "0.044", "\u003C.0001")
height <- c(48, 48, 48, 48)
p_val1 <-data.frame(Dimensions, val, height)

不幸的是,我不确定如何定义geom_text来显示p值。

Error: Aesthetics must be either length 1 or the same as the data (8): label, x, y, fill

我尝试过几个类似的问题,但我的有限知识并没有让我解决问题。有什么想法吗?enter image description here

2 个答案:

答案 0 :(得分:2)

您似乎在原帖中非常接近:错误消息表明您需要为每个图层提供label, x, y, fill属性。 (这是因为您在主ggplot调用中定义了这些属性。)用于p值的图层包含aes_string(x="Dimensions", y="height", label="val")中的三个美学。尝试添加常量填充,例如:

+ geom_text(data = p_values, aes_string(x="Dimensions", y="height", label="val"), fill="black")

或者,如果您不使用多个图层,则可以将美学定义移出主要调用:

ggplot(dims_Blog) +
    geom_boxplot(aes_string(x = x, y = y, fill = colour)) +
    ... +
    geom_text(data = p_values, aes_string(x="Dimensions", y="height", label="val"))

其次,有一个拼写错误 - 您在绘图调用中引用Dimension,但在创建p-val数据框时引用Dimensions

但是,如果没有完整的数据集,我还没有对此进行测试,因此可能会出现一些额外的内容。

答案 1 :(得分:0)

我使用annotate而不是geom_text,它使用三个单独的向量(?)而不是数据帧。代码如下所示:

annotate("text",x=Dimension,y=height,label=val)
Dimension <- c("Info. vs. P. Focus", "Addressee Focus", "Thematic Variation", "Narrative Style")
height <- c(48, 48, 48, 48)
val <- c("p=0.184", "p=0.079", "p=0.044", "p\u003C.0001")

这不是一个非常好的解决方案,但至少它打印了我想要打印的值。我不知道如何将这些向量扩展到8(因为我以前的数据帧是这样大小的)。这也是问题所在。