在这个网站上有很多匹配的“X”和“Y”问题,但我想我有一个新问题。我有两个数据集,一个是较短的(500行),每个人有一个条目。第二个更大(约20,000行),个人可以有多个条目。两者都有出生日期和性别的栏目。我的目标是找到两个数据集中代表的人,并从找出出生日期和性别匹配开始。我的python影响了大脑想出了这个非常缓慢的解决方案:
dob_big <- c('1975-05-04','1968-02-16','1985-02-28','1980-12-12','1976-06-06','1979-06-24','1981-01-28',
'1985-01-16','1984-03-04','1979-06-26','1988-12-22','1975-10-02','1968-02-04','1972-02-01',
'1981-08-06','1989-01-21','1956-06-25','1986-01-19','1980-03-24','1965-08-16')
gender_big <- c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)
big_df <- data_frame(date_birth = dob_big, gender = gender_big)
dob_small <- c('1985-01-16','1984-03-04','1979-06-26')
gender_small <- c(1,0,1)
small_df <- data_frame(date_birth = dob_small, gender = gender_small)
for (i in 1:length(big_df$date_birth)) {
save_row <- FALSE
for (j in 1:length(small_df$date_birth)) {
if (big_df$date_birth[i] == small_df$date_birth[j]
& big_df$gender[i] == small_df$gender[j]) {
print(paste("Match found at ",i,",",j))
save_row <- TRUE
}
}
if (save_row == TRUE) {
matches <- c(matches,i)
}
}
是否有更实用的解决方案在R?
中运行得更快答案 0 :(得分:0)
如果您只想找到两者中代表的那些,您可以执行merge
1
答案 1 :(得分:0)
which
可以替代。
paste0("Match found at ",
which(paste(big_df$date_birth, big_df$gender) %in%
paste(small_df$date_birth, small_df$gender)),
", ",
which(paste(small_df$date_birth, small_df$gender) %in%
paste(big_df$date_birth, big_df$gender)),
collapse = "; ")