图条件均值 - r

时间:2016-11-21 16:07:39

标签: r

这是我的数据集:

ID  A B  Y Time
1   1 0  1 1
1   1 0  4 2
...
1   1 0  7 10
2   1 1  3 1
...

如果A和B是二分的(在ID内没有变化),Y是连续的,每个ID的时间从1到10。

我正在尝试绘制四条线(在同一图表中):

当A = 0且B = 0时为Y,当A = 0且B = 1时为Y,当A = 1且B = 0时为Y,当A = 1且B = 1时为Y

,X轴为时间。

我计算了当A = 0,B = 0,T = 1时的平均Y,然后当A = 0,B = 0,T = 2时计算Y ...但是效率不高。

绘制四条线的最佳方法是什么?

1 个答案:

答案 0 :(得分:1)

以下是使用aggregateggplot2的一种方式:

生成数据

set.seed(123)
df1 <- data.frame(ID = rep(c(1:5), each = 10),
                  A = rep(c(0,0,1,1,0), each = 10),
                  B = rep(c(0,1,0,1,1), each = 10),
                  Y = rnorm(50),
                  Time = rep(1:10, 5))

使用aggregate

df1_agg <- aggregate(Y ~ Time + A + B, data = df1, mean)
#add AB column
df1_agg$AB <- paste('A =', df1_agg$A, 'B =', df1_agg$B)

head(df1_agg) #what does it look like?
  Time A B           Y          AB
1    1 0 0 -0.56047565 A = 0 B = 0
2    2 0 0 -0.23017749 A = 0 B = 0
3    3 0 0  1.55870831 A = 0 B = 0
4    4 0 0  0.07050839 A = 0 B = 0
5    5 0 0  0.12928774 A = 0 B = 0
6    6 0 0  1.71506499 A = 0 B = 0

使用ggplot2

library(ggplot2)
ggplot(data = df1_agg, aes(x = Time, y = Y, colour = AB))+
    geom_line()

enter image description here