Question

我有一个R数据帧df_big

Candidate  Status
A           1
B           10
C           12
D           15
E           25

等等

我有第二个数据框df_small

Candidate_1    Candidate_2
A                C   
B                E    
C                D

我想将df_small和df_big合并为df_final，看起来像

Candidate_1    Candidate_2       Status_1     Status_2
A                C                  1           12   
B                E                  10          25
C                D                  12           15

我尝试了一些效果

df_small_1 = merge(x=df_small,y = df_big,by.x = "Candidate_1",by.y="Candidate") 

df_small_2 = merge(x=df_small,y = df_big,by.x = "Candidate_2",by.y="Candidate")

但我不知道如何将df_small_1和df_small_2合并到df_small

Answer 1

您需要加入两次，一次为两个候选人的状态：

df_result <- merge(x=df_small,  y=df_big, by.x="Candidate_1", by.y="Candidate") 
df_result <- merge(x=df_result, y=df_big, by.x="Candidate_2", by.y="Candidate")

Answer 2

合并是一项昂贵的操作。您可以更好地执行此操作，而无需使用组合和索引进行合并操作。我已经对合并和非合并解决方案进行了基准测试。答案还根据需要精确地给出了列的顺序。

doit <- function(df_small, df_big) 
{

  # Which elements do we need to copy
  indx1 <- df_big[["Candidate"]]  %in% df_small[["Candidate_1"]]

  indx2 <- df_big[["Candidate"]]  %in% df_small[["Candidate_2"]]

  # Copy them
  df_needed <- data.frame(Candiate_1 = df_big[indx1, "Candidate"], Candiate_2 = df_big[indx2, "Candidate"],
                          Status_1 = df_big[indx1, "Status"], Status_2 = df_big[indx2, "Status"])

}

#merge two times
doit_merge <- function(df_small, df_big) 
{
  df_result <- merge(x=df_small,  y=df_big, by.x="Candidate_1", by.y="Candidate") 
  df_result <- merge(x=df_result, y=df_big, by.x="Candidate_2", by.y="Candidate") 
}

library(microbenchmark)

# benchmark results
microbenchmark(
  doit(df_small, df_big) ,
  doit_merge(df_small, df_big) 
)

<强>结果

Unit: microseconds
expr                              min       lq     mean    median        uq      max   neval cld
doit(df_small, df_big)        676.570  758.472 1077.203  834.0115  978.9315 4591.473   100    a 
doit_merge(df_small, df_big) 1329.327 1449.205 1986.995 1612.3940 2021.9070 5966.780   100    b

在多个列上合并R数据帧

2 个答案: