Question

假设我有一个完整的模式文件，例如：

pattern<-data.frame(x1=c(0,0,0,1,0,1,1,1),
                 x2=c(0,0,1,0,1,1,0,1),
                 x3=c(0,1,0,0,1,0,1,1),
                 y=c(11,14, 12, 14, 16, 18, 19, 20))
pattern
  x1 x2 x3  y
1  0  0  0 11
2  0  0  1 14
3  0  1  0 12
4  1  0  0 14
5  0  1  1 16
6  1  1  0 18
7  1  0  1 19
8  1  1  1 20

一个数据文件：

set.seed(123)
df<-data.frame(a=rbinom(100, 1, 0.5), 
               b=rbinom(100, 1, 0.2), 
               c=rbinom(100, 1, 0.6))
head(df)
  a b c
1 0 0 1
2 1 0 0
3 0 0 0
4 1 1 1
5 1 0 1
6 0 1 0

我想要的是从df搜索pattern的每一行并填写y的值，例如：

我想知道在R中是否容易想要这样做。

Answer 1

一种方法是在将列粘贴在一起以匹配match后使用interaction，然后根据匹配的数字索引获取y值。

 indx1 <- as.character(interaction(df, sep=''))
 indx2 <- as.character(interaction(pattern[,-4], sep=''))

 df$y <- pattern$y[match(indx1, indx2)]


  head(df)
  #  a b c  y
  #1 0 0 1 14
  #2 1 0 0 14
  #3 0 0 0 11
  #4 1 1 1 20
  #5 1 0 1 19
  #6 0 1 0 12

或者您可以使用left_join

中的dplyr

 library(dplyr)
 res <- left_join(df, pattern, by=c('a'='x1', 'b'='x2', 'c'='x3'))
 #the `by` part is contributed by @Henrik

 head(res)
 #  a b c  y
 #1 0 0 1 14
 #2 1 0 0 14
 #3 0 0 0 11
 #4 1 1 1 20
 #5 1 0 1 19
 #6 0 1 0 12

或以更快的方式，使用data.table

 library(data.table)
 res1 <- setkey(setDT(pattern))[df] #suggested by @Arun
 head(res1)
 #   x1 x2 x3  y
 #1:  0  0  1 14
 #2:  1  0  0 14
 #3:  0  0  0 11
 #4:  1  1  1 20
 #5:  1  0  1 19
 #6:  0  1  0 12

Answer 2

1）合并在基础R中使用merge：

merge(df, pattern, by = 1:3, all.x = TRUE, all.y = FALSE)

all.x=TRUE表示保留df的所有行，即使它们没有匹配，all.y=FALSE表示不保留pattern的行不匹配{ {1}}。 df和all.x的这种组合是左连接。

2）sqldf 使用SQL：

all.y

左连接保留左手数据框（library(sqldf) sqldf("select df.*, pattern.y from df left join pattern on df.a = pattern.x1 and df.b = pattern.x2 and df.c = pattern.x3")）的所有行，但不保留右手（df）的所有行。

通过匹配模式合并文件

2 个答案: