这是我的df
df <- structure(structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), y = c(NA, NA, NA, NA, 1, NA, NA, NA, 1, 2, NA, NA, 1, 2, 3, NA, 2, 2, 3, 4, NA, 3, 3, 4, 5), x = c(1L, 2L, 3L, 4L,5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c("group", "y", "x"), row.names = c(NA, 25L), class = "data.frame"))
> df
group y x
1 A NA 1
2 A NA 2
3 A NA 3
4 A NA 4
5 A 1 5
6 B NA 1
7 B NA 2
8 B NA 3
9 B 1 4
10 B 2 5
11 C NA 1
12 C NA 2
13 C 1 3
14 C 2 4
15 C 3 5
16 D NA 1
17 D 2 2
18 D 2 3
19 D 3 4
20 D 4 5
21 E NA 1
22 E 3 2
23 E 3 3
24 E 4 4
25 E 5 5
我的目标是使用mutate
计算每个x的平均值(跨组)。但首先我想过滤数据,这样只剩下那些至少有3个非NA值的x值。所以在这个例子中我只想包含那些x至少为3的条目。我无法弄清楚如何创建filter()
,有什么建议吗?
答案 0 :(得分:9)
你可以尝试
df %>%
group_by(group) %>% #group_by(x) %>% #as per the OP's clarification
filter(sum(!is.na(y))>=3) %>%
mutate(Mean=mean(x, na.rm=TRUE))