我有2个数据帧df_1和df_2。他们共有3个共同点:permno,cusip和ticker。 df_1的每一行都是一个独特的股票。 df_1中的permno,cusip和ticker用于识别df_2中的库存收益。有时这些变量中的一个或两个不可用,但在每一行中至少有一个变量可用。我将使用该值来查找df_2中的返回值。
如果在permno,cusip或ticker三列中至少有一列匹配,你能否建议合并df_1和df_2的任何(快速)方式。
df_1
id permno cusip ticker
1 1 11 AA
2 NA 12 NA
3 2 13 NA
4 5 NA NA
df_2
permno cusip ticker return date
1 11 NA 100 date_1
7 15 BX 102 date_2
2 NA CU 103 date_3
期望的结果
id permno cusip ticker return date
1 1 11 AA 100 date_1
1 1 11 NA 100 date_1
3 2 13 NA 103 date_3
3 2 NA CU 103 date_3
答案 0 :(得分:1)
这应该有用。
# define common columns in both data frames
colmatch <- c("permno", "cusip", "ticker")
# function to trim down data frame A to just those with rows
# that have at least one match in common column with data frame B
# and append columns from B which are not found in A
simplify <- function(df1, df2, col = colmatch) {
# find all common column elements that matches
idx <- sapply(col, function(x)
match(df1[[x]], df2[[x]], incomparables=NA)
)
# find rows in first data frame with at least one match
idx1 <- which(apply(idx, 1, function(x) !all(is.na(x))))
# find corresponding rows in second data frame
idx2 <- apply(idx[idx1, ], 1, function(x) x[min(which(!is.na(x)))])
# copy columns from second data frame to first data frame
# only for rows which matches above
dff <- cbind(df1[idx1, ], df2[idx2, !(names(df2) %in% colmatch), drop=F])
}
# assemble the final output
df_final <- rbind(simplify(df_1, df_2), # find df_1 rows with matches in df_2
simplify(df_2, df_1)) # and vice versa
最终输出(如果您喜欢按id
排序)
> df_final[order(df_final$id), ]
id permno cusip ticker return date
1 1 1 11 AA 100 date_1
11 1 1 11 <NA> 100 date_1
3 3 2 13 <NA> 103 date_3
31 3 2 NA CU 103 date_3