如何在两列(此处:treatment
,replicate
)中至少在另一行中识别具有相同值的行?
set.seed(0)
x <- rep(1:10, 4)
y <- sample(c(rep(1:10, 2)+rnorm(20)/5, rep(6:15, 2) + rnorm(20)/5))
treatment <- sample(gl(8, 5, 40, labels=letters[1:8]))
replicate <- sample(gl(8, 5, 40))
d <- data.frame(x=x, y=y, treatment=treatment, replicate=replicate)
table(d$treatment, d$replicate)
# 1 2 3 4 5 6 7 8
# a 1 0 0 1 1 2 0 0
# b 1 1 0 0 1 2 0 0
# c 0 0 0 0 2 0 1 2
# d 2 0 1 1 0 0 1 0
# e 0 2 1 1 0 0 0 1
# f 0 1 1 0 1 1 1 0
# g 0 1 0 2 0 0 1 1
# h 1 0 2 0 0 0 1 1
从上面的输出中,我的猜测是输出应该包含16行。知道如何实现这个目标吗?
更新
d %>% group_by(treatment, replicate) %>% filter(n()>1)
# A tibble: 16 x 4
x y treatment replicate
<int> <dbl> <fctr> <fctr>
1 2 7.050445 h 3
2 5 1.840198 b 6
3 8 9.160838 d 1
4 9 4.254486 h 3
5 2 8.870106 g 4
6 4 7.821616 a 6
7 6 9.752492 e 2
8 7 9.988579 c 5
9 9 10.480931 c 8
10 1 2.770469 c 8
11 2 7.913338 e 2
12 3 13.743080 d 1
13 9 5.692010 b 6
14 10 11.100722 a 6
15 3 12.198432 g 4
16 5 5.955146 c 5
我已经确定了一种结果似乎满足条件的方法。其他更好的解决方案?
答案 0 :(得分:0)
您可以使用duplicated
作为条件:
dups <- d[which(duplicated(d[,c("treatment", "replicate")]) |
duplicated(d[ ,c("treatment", "replicate")], fromLast = TRUE)),]
>dups
x y treatment replicate
2 2 7.050445 h 3
5 5 1.840198 b 6
8 8 9.160838 d 1
9 9 4.254486 h 3
12 2 8.870106 g 4
14 4 7.821616 a 6
16 6 9.752492 e 2
17 7 9.988579 c 5
19 9 10.480931 c 8
21 1 2.770469 c 8
22 2 7.913338 e 2
23 3 13.743080 d 1
29 9 5.692010 b 6
30 10 11.100722 a 6
33 3 12.198432 g 4
35 5 5.955146 c 5