过滤到两列中有重复值的所有行(dplyr)

时间:2018-09-25 22:23:59

标签: r dplyr

我有一个看起来像这样的数据框:

id        dob lname
1 1900-01-01     a
2 1900-01-01     b
3 1900-01-01     b
4 1901-01-01     c
5 1901-01-01     d
6 1902-01-01     e
7 1902-01-01     e
8 1902-01-01     f
9 1903-01-01     g
10 1903-01-01     h

我想过滤以显示重复dob和重复重复lname的所有行,因此所需的输出看起来像这样:

id        dob lname
2 1900-01-01     b
3 1900-01-01     b
6 1902-01-01     e
7 1902-01-01     e

我尝试按dob和lname进行分组,但是我陷入了下一步,它将返回那些列具有重复值的所有行。

以下是示例代码:

id <- c(1:10)
dob <- date(c("1900-01-01", "1900-01-01", "1900-01-01", "1901-01-01", "1901-01-01", "1902-01-01", "1902-01-01", "1902-01-01", "1903-01-01", "1903-01-01"))
lname <- c("a", "b", "b", "c", "d", "e", "e", "f", "g", "h")
df <- data.frame("id" = id, "dob" = dob, "lname" = lname)

2 个答案:

答案 0 :(得分:1)

dplyr解决方案是否可以满足您的需求?

library(dplyr)

    df %>%
         semi_join(df %>%
                   group_by(dob, lname) %>%
                   filter(row_number()>1), 
                   by = c("dob", "lname"))

答案 1 :(得分:0)

这里是使用基数R的单行解决方案-

List<Gather.InputEnum> bothDtmfAndSpeech =
    new List<Gather.InputEnum>(2){
        Gather.InputEnum.Dtmf, Gather.InputEnum.Speech
    };
var gather = new Gather(
     action: new Uri(Url.Action("Show", "Menu")),
     numDigits: 1, input:bothDtmfAndSpeech, bargeIn: true);

用于管道-

id <- c(1:10)
dob <- as.Date(c("1900-01-01", "1900-01-01", "1900-01-01", "1901-01-01", "1901-01-01", "1902-01-01", "1902-01-01", "1902-01-01", "1903-01-01", "1903-01-01"))
lname <- c("a", "b", "b", "c", "d", "e", "e", "f", "g", "h")
df <- data.frame("id" = id, "dob" = dob, "lname" = lname)

result <- df[duplicated(df[,2:3]) | duplicated(df[,2:3], fromLast = T), ]
result

另一种df %>% .[duplicated(.[,2:3]) | duplicated(.[,2:3], fromLast = T), ] 方法-

dplyr