我有一个R数据帧df_big
Candidate Status
A 1
B 10
C 12
D 15
E 25
等等
我有第二个数据框df_small
Candidate_1 Candidate_2
A C
B E
C D
我想将df_small
和df_big
合并为df_final
,看起来像
Candidate_1 Candidate_2 Status_1 Status_2
A C 1 12
B E 10 25
C D 12 15
我尝试了一些效果
df_small_1 = merge(x=df_small,y = df_big,by.x = "Candidate_1",by.y="Candidate")
df_small_2 = merge(x=df_small,y = df_big,by.x = "Candidate_2",by.y="Candidate")
但我不知道如何将df_small_1
和df_small_2
合并到df_small
答案 0 :(得分:2)
您需要加入两次,一次为两个候选人的状态:
df_result <- merge(x=df_small, y=df_big, by.x="Candidate_1", by.y="Candidate")
df_result <- merge(x=df_result, y=df_big, by.x="Candidate_2", by.y="Candidate")
答案 1 :(得分:0)
合并是一项昂贵的操作。您可以更好地执行此操作,而无需使用组合和索引进行合并操作。我已经对合并和非合并解决方案进行了基准测试。答案还根据需要精确地给出了列的顺序。
doit <- function(df_small, df_big)
{
# Which elements do we need to copy
indx1 <- df_big[["Candidate"]] %in% df_small[["Candidate_1"]]
indx2 <- df_big[["Candidate"]] %in% df_small[["Candidate_2"]]
# Copy them
df_needed <- data.frame(Candiate_1 = df_big[indx1, "Candidate"], Candiate_2 = df_big[indx2, "Candidate"],
Status_1 = df_big[indx1, "Status"], Status_2 = df_big[indx2, "Status"])
}
#merge two times
doit_merge <- function(df_small, df_big)
{
df_result <- merge(x=df_small, y=df_big, by.x="Candidate_1", by.y="Candidate")
df_result <- merge(x=df_result, y=df_big, by.x="Candidate_2", by.y="Candidate")
}
library(microbenchmark)
# benchmark results
microbenchmark(
doit(df_small, df_big) ,
doit_merge(df_small, df_big)
)
<强>结果
Unit: microseconds
expr min lq mean median uq max neval cld
doit(df_small, df_big) 676.570 758.472 1077.203 834.0115 978.9315 4591.473 100 a
doit_merge(df_small, df_big) 1329.327 1449.205 1986.995 1612.3940 2021.9070 5966.780 100 b