Question

您好我有两个数据集，其中第一个是一组索引：

ind1<-rep(c("E","W"), times=20)
ind2<-sample(100:150, 40)
y<-c(1:40)
index<-data.frame(cbind(ind1, ind2, y))

第二个数据集是需要编入索引的数据集。

x1<-sample(c("E","W","N"), 40, replace=TRUE)
x2<-sample(100:150, 40)
x3<-rep(0, times=40)
data<-data.frame(cbind(x1,x2,x3))

我想在x3中指明x1中的x2和data与ind1和ind2匹配index 1}}分别返回相应的y。

index1<-split(index, index$ind1)
data1<-split(data, data$x1)
data1$E$x3<-match(data1$E$x2, index1$E$ind2)
data1$W$x3<-match(data1$W$x2, index1$W$ind2)

它有点符合我想要的方式但没有正确返回y。我做错了哪一部分？感谢。

此外，还有更快/更智能的方法吗？因为我可能有更多条件可供选择。本来我试过if else声明，但没有奏效。

Answer 1

merge(data, index, by.x=c("ind1", "ind2"), by.y=c("x1", "x2"), all.x=TRUE, all.y=FALSE)

会为x和y以及ind1和ind2的每个匹配组合提供x1和x2值。 x1和x2的所有组合都将保留（即使ind1和ind2的组合未出现在index中，但组合ind1 {}}中未出现的{}}和ind2将被删除。如上所述，解决方案将保留data和x3值，但如果您愿意根据@ Ferdinand.kraft的建议，删除可以使用y的{{1}}值。

Answer 2

有很多方法可以解决这个问题，这实际上取决于数据的特征。这是最直接的匹配方法：

粘贴：'粘贴'功能允许您从多个数据片段创建一个字符串。如果您使用具有相同匹配项的列匹配数据集，则只需将列粘贴在一起并使用“匹配”语句直接进行比较，如下所示：

new_data <- data

new_data$x3 <- ifelse(
    is.na(match(paste(data$x1, data$x2), paste(index$ind1, index$ind2))),
    0,
    index$y)

此处的匹配语句比较x1 + x2和ind1 + ind2对之间的精确匹配，并返回一个整数，指示哪个索引对位置对应于每个数据行。如果未找到匹配项，则返回NA。通过在'ifelse'语句中检查NA，我们然后为NA值写入零并返回任何匹配的相应y值。

Answer 3

您也可以使用left_join()包中的dplyr：

require(dplyr)
left_join(data, index, by = c("x1" = "ind1", "x2" = "ind2"))

了解更多here

Answer 4

这个问题与match two data.frames based on multiple columns有关。

您可以按照interaction的建议使用paste或Dinre来匹配多列。

#Write the row number of index in x3 which matches
data$x3 <- match(interaction(data[c("x1", "x2")]), interaction(index[c("ind1","ind2")]))

#In case you want to return 0 instead of NA for nomatch
data$x3 <- match(interaction(data[c("x1", "x2")]), interaction(index[c("ind1","ind2")]), nomatch=0)

#Instead of >interaction< you could also use paste as already suggested by Dinre
data$x3 <- match(paste(data$x1, data$x2), paste(index$ind1, index$ind2))

R匹配超过2个条件并返回响应值

4 个答案: