R:如何根据特定规则从数据帧中选择数据,并将数据作为新列添加到现有数据帧

时间:2015-12-09 04:18:24

标签: r dataframe pattern-matching

我有两个数据帧df1和df2。

df1 <- data.frame(x1=c("A35", "A41", "A49"),
                  x2=c(8, 24, 33),
                  x3=c(15, 63, 54))

df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
                  y2 = c(9, 20, 24, 32, 84))

我想根据以下三个标准从df2中选择行

(1)df2的y1与df1的x1匹配;

(2)df2的y2> = df1的x2

(3)df2的y2 =&lt; x3 of df1

符合条件的数据将作为新列添加到df1。如果df1的行有多个匹配项,那么这些附加匹配项也会被添加为新列。

预期结果

data.frame(x1=c("A35", "A41", "A49"),
           x2=c(8, 24, 33),
           x3=c(15, 63, 54),
           z1 = c("A35", "A41", ""),
           z2 = c(9, 24,""),
           z3 = c("", "A41", ""),
           z4 = c("", 32, ""))

x1 x2 x3 z1 z2 z3 z4
A35 8 15 A35 9  
A41 24 63 A41 24 A41 32
A49 33 54  

提前致谢!

2 个答案:

答案 0 :(得分:0)

建议不要使用长度不等的数据帧,使用列表应该更好地用于此目的。

我创建了一个完成工作的代码,即使我不确定它是最有效的方式。

首先,您需要比较两个数据帧的每一行。这可以使用apply函数中的apply函数来完成(基本上:对于df1中的每一行,与df2中的每一行进行比较)并返回匹配值及其索引。

这将存储在一个杂乱的列表中,其中包含不匹配的空元素。因此,在清理列表后,可以使用sapply函数将生成的匹配添加到df1的每一行中。

df1 <- data.frame(x1=c("A35", "A41", "A49"),
              x2=c(8, 24, 33),
              x3=c(15, 63, 54))

df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
                  y2 = c(9, 20, 24, 32, 84))

matches <- apply(df2,1,function(x) apply(df1,1,function(y) 
  if(x[1]==y[1]&&x[2]>=y[2]&&x[2]<=y[3]){
    c(which(df1==x[1]),x[1:2])
  }))
addedelem <- t(array(unlist(matches), dim=c(3,length(unlist(matches))/3)))
result <- sapply(1:length(df1$x1), function(x) (c(as.matrix(df1[x,]),t(addedelem[which(addedelem[,1]==x),2:3]))))

结果列表是您正在寻找的。如果有必要,您可以将其重新转换为数据帧。

> result
[[1]]
[1] "A35" "8"   "15"  "A35" " 9" 

[[2]]
[1] "A41" "24"  "63"  "A41" "24"  "A41" "32" 

[[3]]
[1] "A49" "33"  "54" 

答案 1 :(得分:0)

如果我正确理解你的问题,这应该有效:

### we use the matches to pick our values from df1
### we use our conditions to pick our values from df2
matches <- match(df2$y1,df1$x1)
matches <- matches[!is.na(matches)]
condition1 <- df2$y1 %in% df1$x1
condition2 <- df2$y2[condition1] >= df1$x2[matches]
condition3 <- df2$y2[condition1] <= df1$x3[matches]

### i create these tmp variables so you can see step by step
### what each line of code is doing
### here i am finding the values that meet all the conditions
### then i am pulling the associated y2 values
tmp <- data.frame(x1=df1$x1[matches],y2=df2$y2[condition1])
tmp <- tmp[condition2&condition3,]
tmp <- droplevels(tmp)

### now that we have the values we want
### we are organizing the data in the desired output you 
### specified. 
x <- split(tmp[-1], tmp[[1]])
tmp2 <- data.frame()
for(i in 1:length(x)){

  df <- data.frame(t(unlist(x[[i]], use.names=FALSE)))
  colnames(df) <- seq(1,nrow(x[[i]]))
  tmp2 <- rbind.fill(tmp2,df)

}
colnames(tmp2) <- paste(rep("z",ncol(tmp2)),1:ncol(tmp2),sep="")
res <- data.frame(df1[df1$x1 %in% names(x),],tmp2)
res <- rbind.fill(res,df1[!df1$x1 %in% names(x),])

>res
   x1 x2 x3 z1 z2
1 A35  8 15  9 NA
2 A41 24 63 24 32
3 A49 33 54 NA NA