我有两个数据帧df1和df2。
df1 <- data.frame(x1=c("A35", "A41", "A49"),
x2=c(8, 24, 33),
x3=c(15, 63, 54))
df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
y2 = c(9, 20, 24, 32, 84))
我想根据以下三个标准从df2中选择行
(1)df2的y1与df1的x1匹配;
(2)df2的y2> = df1的x2
(3)df2的y2 =&lt; x3 of df1
符合条件的数据将作为新列添加到df1。如果df1的行有多个匹配项,那么这些附加匹配项也会被添加为新列。
预期结果
data.frame(x1=c("A35", "A41", "A49"),
x2=c(8, 24, 33),
x3=c(15, 63, 54),
z1 = c("A35", "A41", ""),
z2 = c(9, 24,""),
z3 = c("", "A41", ""),
z4 = c("", 32, ""))
x1 x2 x3 z1 z2 z3 z4
A35 8 15 A35 9
A41 24 63 A41 24 A41 32
A49 33 54
提前致谢!
答案 0 :(得分:0)
建议不要使用长度不等的数据帧,使用列表应该更好地用于此目的。
我创建了一个完成工作的代码,即使我不确定它是最有效的方式。
首先,您需要比较两个数据帧的每一行。这可以使用apply函数中的apply函数来完成(基本上:对于df1中的每一行,与df2中的每一行进行比较)并返回匹配值及其索引。
这将存储在一个杂乱的列表中,其中包含不匹配的空元素。因此,在清理列表后,可以使用sapply函数将生成的匹配添加到df1的每一行中。
df1 <- data.frame(x1=c("A35", "A41", "A49"),
x2=c(8, 24, 33),
x3=c(15, 63, 54))
df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
y2 = c(9, 20, 24, 32, 84))
matches <- apply(df2,1,function(x) apply(df1,1,function(y)
if(x[1]==y[1]&&x[2]>=y[2]&&x[2]<=y[3]){
c(which(df1==x[1]),x[1:2])
}))
addedelem <- t(array(unlist(matches), dim=c(3,length(unlist(matches))/3)))
result <- sapply(1:length(df1$x1), function(x) (c(as.matrix(df1[x,]),t(addedelem[which(addedelem[,1]==x),2:3]))))
结果列表是您正在寻找的。如果有必要,您可以将其重新转换为数据帧。
> result
[[1]]
[1] "A35" "8" "15" "A35" " 9"
[[2]]
[1] "A41" "24" "63" "A41" "24" "A41" "32"
[[3]]
[1] "A49" "33" "54"
答案 1 :(得分:0)
如果我正确理解你的问题,这应该有效:
### we use the matches to pick our values from df1
### we use our conditions to pick our values from df2
matches <- match(df2$y1,df1$x1)
matches <- matches[!is.na(matches)]
condition1 <- df2$y1 %in% df1$x1
condition2 <- df2$y2[condition1] >= df1$x2[matches]
condition3 <- df2$y2[condition1] <= df1$x3[matches]
### i create these tmp variables so you can see step by step
### what each line of code is doing
### here i am finding the values that meet all the conditions
### then i am pulling the associated y2 values
tmp <- data.frame(x1=df1$x1[matches],y2=df2$y2[condition1])
tmp <- tmp[condition2&condition3,]
tmp <- droplevels(tmp)
### now that we have the values we want
### we are organizing the data in the desired output you
### specified.
x <- split(tmp[-1], tmp[[1]])
tmp2 <- data.frame()
for(i in 1:length(x)){
df <- data.frame(t(unlist(x[[i]], use.names=FALSE)))
colnames(df) <- seq(1,nrow(x[[i]]))
tmp2 <- rbind.fill(tmp2,df)
}
colnames(tmp2) <- paste(rep("z",ncol(tmp2)),1:ncol(tmp2),sep="")
res <- data.frame(df1[df1$x1 %in% names(x),],tmp2)
res <- rbind.fill(res,df1[!df1$x1 %in% names(x),])
>res
x1 x2 x3 z1 z2
1 A35 8 15 9 NA
2 A41 24 63 24 32
3 A49 33 54 NA NA