在R中的两个数据集中查找匹配的个体

时间:2018-03-20 14:16:37

标签: r functional-programming

在这个网站上有很多匹配的“X”和“Y”问题,但我想我有一个新问题。我有两个数据集,一个是较短的(500行),每个人有一个条目。第二个更大(约20,000行),个人可以有多个条目。两者都有出生日期和性别的栏目。我的目标是找到两个数据集中代表的人,并从找出出生日期和性别匹配开始。我的python影响了大脑想出了这个非常缓慢的解决方案:

dob_big <- c('1975-05-04','1968-02-16','1985-02-28','1980-12-12','1976-06-06','1979-06-24','1981-01-28',
         '1985-01-16','1984-03-04','1979-06-26','1988-12-22','1975-10-02','1968-02-04','1972-02-01',
         '1981-08-06','1989-01-21','1956-06-25','1986-01-19','1980-03-24','1965-08-16')
gender_big <- c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)
big_df <- data_frame(date_birth = dob_big, gender = gender_big)
dob_small <- c('1985-01-16','1984-03-04','1979-06-26')
gender_small <- c(1,0,1)
small_df <- data_frame(date_birth = dob_small, gender = gender_small)

for (i in 1:length(big_df$date_birth)) {
    save_row <- FALSE
    for (j in 1:length(small_df$date_birth)) {
        if (big_df$date_birth[i] == small_df$date_birth[j]
        & big_df$gender[i] == small_df$gender[j]) {
            print(paste("Match found at ",i,",",j))
            save_row <- TRUE
        }
    }
    if (save_row == TRUE) {
    matches <- c(matches,i)
    }
}

是否有更实用的解决方案在R?

中运行得更快

2 个答案:

答案 0 :(得分:0)

如果您只想找到两者中代表的那些,您可以执行merge

1

答案 1 :(得分:0)

which可以替代。

paste0("Match found at ",
       which(paste(big_df$date_birth, big_df$gender) %in% 
               paste(small_df$date_birth, small_df$gender)),
       ", ",
       which(paste(small_df$date_birth, small_df$gender) %in% 
               paste(big_df$date_birth, big_df$gender)),
       collapse = "; ")