R:按两列匹配行

时间:2019-03-23 17:45:20

标签: r match

我目前正在试图找出一种矢量化的方式来匹配同一行中的两个值。我有以下两个简化的数据帧:

# Dataframe 1: Displaying all my observations
df1 <- data.frame(c(1, 2, 3, 4, 5, 6, 7, 8),
                  c("A", "B", "C", "D", "A", "B", "A", "C"), 
                  c("B", "E", "D", "A", "C", "A", "D", "A"))
colnames(df1) <- c("ID", "Number1", "Number2")

> df1
  ID Number1 Number2
1  1       A       B
2  2       B       E
3  3       C       D
4  4       D       A
5  5       A       C
6  6       B       A
7  7       A       D
8  8       C       A

# Dataframe 2: Matrix of observations I am interested in
df2 <- matrix(c("A", "B",
                "D", "A",
                "C", "B",
                "E", "D"),
              ncol = 2,
              byrow = TRUE)

> df2
     [,1] [,2]
[1,] "A"  "B" 
[2,] "D"  "A" 
[3,] "C"  "B" 
[4,] "E"  "D" 

我要完成的工作是在df1中创建一个新列,仅当df2中存在确切的组合时才声明TRUE(例如ID = 1等于df2中的第一行,因为它们都由A组成和B)。另外,如果有捷径,我也希望数字反转时状态为TRUE,即df1 $ Number1匹配df2 [i,2],而df1 $ Number2匹配df2 [i,1](例如ID) = 7,则df1中的组合为A,D,而df2中的组合为D,A-> TRUE)。

我想要的输出如下:

> df1
  ID Number1 Number2 Status
1  1       A       B   TRUE
2  2       B       E  FALSE
3  3       C       D  FALSE
4  4       D       A   TRUE
5  5       A       C  FALSE
6  6       B       A  TRUE
7  7       A       D  TRUE
8  8       C       A  FALSE

到目前为止,我所得到的是:

for (i in 1:nrow(df1)) {
  for (j in 1:nrow(df2)) {
    Status <- ifelse(df1$Number1[i] %in% df2[j,1] && 
                     df1$Number2[i] %in% df2[j,2], TRUE, FALSE)
    StatusComb[i,j] <- Status
  }
  df1$Status[i] <- ifelse(any(StatusComb[i,]) == TRUE, TRUE, FALSE)
}

这真的效率低下(您可以清楚地告诉我R是新手),而且看起来也不是很好。我将不胜感激!

2 个答案:

答案 0 :(得分:0)

一种方法是将事物merge在一起。

为了适应您的数据,以解决标签颠倒的问题,我将在其自身上颠倒df2并进行查找:

df2 <- rbind.data.frame(df2, df2[,c(2,1)])
colnames(df2) <- c("Number1", "Number2")
df2$a <- TRUE
df2
#   Number1 Number2    a
# 1       A       B TRUE
# 2       D       A TRUE
# 3       C       B TRUE
# 4       E       D TRUE
# 5       B       A TRUE
# 6       A       D TRUE
# 7       B       C TRUE
# 8       D       E TRUE

我添加了a,以便将其合并。

df3 <- merge(df1, df2, all.x = TRUE)
df3$a <- !is.na(df3$a)
df3[ order(df3$ID), ]
#   Number1 Number2 ID     a
# 1       A       B  1  TRUE
# 5       B       E  2 FALSE
# 7       C       D  3 FALSE
# 8       D       A  4  TRUE
# 2       A       C  5 FALSE
# 4       B       A  6  TRUE
# 3       A       D  7  TRUE
# 6       C       A  8 FALSE

如果您在!is.na(df3$a)之前查看此列,则会看到该列完全是TRUENA({{1}中没有NA });如果这足以满足您的需要,则可以省略中间步骤。 df2步骤只是因为不能保证与order的行顺序(实际上,我发现它总是不方便地与众不同)。由于先前是merge订购的产品,因此我将其恢复为原来的状态,但这完全是为了美观,以匹配您所需的输出。

答案 1 :(得分:0)

您可以按以下字母顺序定义要搜索的combination变量:

combination <- apply(df2, 1, function(x) {
  paste(sort(x), collapse = '')
})
combination
[1] "AB" "AD" "BC" "DE"

然后根据“数字”字段的串联来更改“状态”字段

library(dplyr)
df1 %>%
  rowwise() %>%
  mutate(S = paste(sort(c(Number1, Number2)), collapse = "")) %>%
  mutate(Status = ifelse(S %in% combination, TRUE, FALSE))
Source: local data frame [8 x 5]
Groups: <by row>

# A tibble: 8 x 5
     ID Number1 Number2 S     Status
  <dbl> <chr>   <chr>   <chr> <lgl> 
1     1 A       B       AB    TRUE  
2     2 B       E       BE    FALSE 
3     3 C       D       CD    FALSE 
4     4 D       A       AD    TRUE  
5     5 A       C       AC    FALSE 
6     6 B       A       AB    TRUE  
7     7 A       D       AD    TRUE  
8     8 C       A       AC    FALSE 

数据:

我在数据框中设置了stringsAsFactors = F

df1 <- data.frame(c(1, 2, 3, 4, 5, 6, 7, 8),
                    c("A", "B", "C", "D", "A", "B", "A", "C"), 
                    c("B", "E", "D", "A", "C", "A", "D", "A"), stringsAsFactors = F)
colnames(df1) <- c("ID", "Number1", "Number2")