识别在两列中具有共同值的行

时间:2017-06-02 15:13:45

标签: r dplyr tidyr tidyverse

如何在两列(此处:treatmentreplicate)中至少在另一行中识别具有相同值的行?

set.seed(0)
x <- rep(1:10, 4)
y <- sample(c(rep(1:10, 2)+rnorm(20)/5, rep(6:15, 2) + rnorm(20)/5))
treatment <- sample(gl(8, 5, 40, labels=letters[1:8]))
replicate <- sample(gl(8, 5, 40))
d <- data.frame(x=x, y=y, treatment=treatment, replicate=replicate)

table(d$treatment, d$replicate)

#   1 2 3 4 5 6 7 8
# a 1 0 0 1 1 2 0 0
# b 1 1 0 0 1 2 0 0
# c 0 0 0 0 2 0 1 2
# d 2 0 1 1 0 0 1 0
# e 0 2 1 1 0 0 0 1
# f 0 1 1 0 1 1 1 0
# g 0 1 0 2 0 0 1 1
# h 1 0 2 0 0 0 1 1

从上面的输出中,我的猜测是输出应该包含16行。知道如何实现这个目标吗?

更新

d %>% group_by(treatment, replicate) %>% filter(n()>1)
# A tibble: 16 x 4
       x         y treatment replicate
   <int>     <dbl>    <fctr>    <fctr>
 1     2  7.050445         h         3
 2     5  1.840198         b         6
 3     8  9.160838         d         1
 4     9  4.254486         h         3
 5     2  8.870106         g         4
 6     4  7.821616         a         6
 7     6  9.752492         e         2
 8     7  9.988579         c         5
 9     9 10.480931         c         8
10     1  2.770469         c         8
11     2  7.913338         e         2
12     3 13.743080         d         1
13     9  5.692010         b         6
14    10 11.100722         a         6
15     3 12.198432         g         4
16     5  5.955146         c         5

我已经确定了一种结果似乎满足条件的方法。其他更好的解决方案?

1 个答案:

答案 0 :(得分:0)

您可以使用duplicated作为条件:

dups <- d[which(duplicated(d[,c("treatment", "replicate")]) | 
                duplicated(d[ ,c("treatment", "replicate")], fromLast = TRUE)),]

>dups
        x         y treatment replicate
2   2  7.050445         h         3
5   5  1.840198         b         6
8   8  9.160838         d         1
9   9  4.254486         h         3
12  2  8.870106         g         4
14  4  7.821616         a         6
16  6  9.752492         e         2
17  7  9.988579         c         5
19  9 10.480931         c         8
21  1  2.770469         c         8
22  2  7.913338         e         2
23  3 13.743080         d         1
29  9  5.692010         b         6
30 10 11.100722         a         6
33  3 12.198432         g         4
35  5  5.955146         c         5