合并2个数据帧以返回匹配测量名称的所有行

时间:2017-03-06 20:09:16

标签: r dataframe merge

我有2个像这样的数据框

ID <- c("A","B","C")
Type <- c("PASS","PASS","FAIL")
Measurement <- c("Length","Height","Breadth")
Function <- c("Volume","Area","Circumference")
df1 <- data.frame(ID,Type,Measurement,Function)

ID <- c("A","B","C","C")
Type <- c("PASS","PASS","FAIL","FAIL")
Measurement <- c("Length","Height","Breadth","Breadth_DSPT")
df2 <- data.frame(ID,Type,Measurement)

我试图以返回匹配度量的方式合并这两个数据帧,并返回具有由另一个字符串连接的匹配度量的行。

所需的输出

  ID Type  Measurement      Function
   A PASS       Length        Volume
   B PASS       Height          Area
   C FAIL      Breadth Circumference
   C FAIL Breadth_DSPT Circumference

我使用这样的合并函数来获取前3行但我们如何匹配数据框中的测量名称以返回所有匹配的行?

df <- merge(df1,df2,by=c("ID","Type","Measurement"),all.x=T)

3 个答案:

答案 0 :(得分:3)

实现它的一种方法是使用sqldf包:

library(sqldf)

sqldf("select df1.ID, df1.Type, df2.Measurement, df1.Function
      from df1 left join df2 on (df1.ID = df2.ID and 
                                 df1.Type = df2.Type and 
                                 df2.Measurement like df1.Measurement||'%')")

#   ID Type  Measurement      Function
# 1  A PASS       Length        Volume
# 2  B PASS       Height          Area
# 3  C FAIL      Breadth Circumference
# 4  C FAIL Breadth_DSPT Circumference

连接中的最后一个子句(df2.Measurement like df1.Measurement||'%')意味着df2$Measurement必须等于df1$Measurement后跟任何字符串,但您可以使用SQL&#39指定更灵活的条件; s %_

答案 1 :(得分:2)

如果您只是在字符串的末尾连接,您可以执行以下操作:

merge(
  transform(df2, tmpmeas = sub("_.+$", "", Measurement)),
  df1,
  by.x=c("ID","Type","tmpmeas"), by.y=c("ID","Type","Measurement")
)[-3]
#  ID Type  Measurement      Function
#1  A PASS       Length        Volume
#2  B PASS       Height          Area
#3  C FAIL      Breadth Circumference
#4  C FAIL Breadth_DSPT Circumference

答案 2 :(得分:-1)

您可以使用data.table库来执行此操作。首先将您的dataframe转换为datatable,使用setkey设置每个表的密钥,然后merge

dt1 <- data.table(df1)
dt2 <- data.table(df2)
setkey(dt1,ID)
setkey(dt2,ID)
merge(dt1,dt2)

#    ID Type.x Measurement.x      Function Type.y Measurement.y
# 1:  A   PASS        Length        Volume   PASS        Length
# 2:  B   PASS        Height          Area   PASS        Height
# 3:  C   FAIL       Breadth Circumference   FAIL       Breadth
# 4:  C   FAIL       Breadth Circumference   FAIL  Breadth_DSPT