我有2个像这样的数据框
ID <- c("A","B","C")
Type <- c("PASS","PASS","FAIL")
Measurement <- c("Length","Height","Breadth")
Function <- c("Volume","Area","Circumference")
df1 <- data.frame(ID,Type,Measurement,Function)
ID <- c("A","B","C","C")
Type <- c("PASS","PASS","FAIL","FAIL")
Measurement <- c("Length","Height","Breadth","Breadth_DSPT")
df2 <- data.frame(ID,Type,Measurement)
我试图以返回匹配度量的方式合并这两个数据帧,并返回具有由另一个字符串连接的匹配度量的行。
所需的输出是
ID Type Measurement Function
A PASS Length Volume
B PASS Height Area
C FAIL Breadth Circumference
C FAIL Breadth_DSPT Circumference
我使用这样的合并函数来获取前3行但我们如何匹配数据框中的测量名称以返回所有匹配的行?
df <- merge(df1,df2,by=c("ID","Type","Measurement"),all.x=T)
答案 0 :(得分:3)
实现它的一种方法是使用sqldf
包:
library(sqldf)
sqldf("select df1.ID, df1.Type, df2.Measurement, df1.Function
from df1 left join df2 on (df1.ID = df2.ID and
df1.Type = df2.Type and
df2.Measurement like df1.Measurement||'%')")
# ID Type Measurement Function
# 1 A PASS Length Volume
# 2 B PASS Height Area
# 3 C FAIL Breadth Circumference
# 4 C FAIL Breadth_DSPT Circumference
连接中的最后一个子句(df2.Measurement like df1.Measurement||'%'
)意味着df2$Measurement
必须等于df1$Measurement
后跟任何字符串,但您可以使用SQL&#39指定更灵活的条件; s %
和_
。
答案 1 :(得分:2)
如果您只是在字符串的末尾连接,您可以执行以下操作:
merge(
transform(df2, tmpmeas = sub("_.+$", "", Measurement)),
df1,
by.x=c("ID","Type","tmpmeas"), by.y=c("ID","Type","Measurement")
)[-3]
# ID Type Measurement Function
#1 A PASS Length Volume
#2 B PASS Height Area
#3 C FAIL Breadth Circumference
#4 C FAIL Breadth_DSPT Circumference
答案 2 :(得分:-1)
您可以使用data.table
库来执行此操作。首先将您的dataframe
转换为datatable
,使用setkey
设置每个表的密钥,然后merge
。
dt1 <- data.table(df1)
dt2 <- data.table(df2)
setkey(dt1,ID)
setkey(dt2,ID)
merge(dt1,dt2)
# ID Type.x Measurement.x Function Type.y Measurement.y
# 1: A PASS Length Volume PASS Length
# 2: B PASS Height Area PASS Height
# 3: C FAIL Breadth Circumference FAIL Breadth
# 4: C FAIL Breadth Circumference FAIL Breadth_DSPT