我目前正在试图找出一种矢量化的方式来匹配同一行中的两个值。我有以下两个简化的数据帧:
# Dataframe 1: Displaying all my observations
df1 <- data.frame(c(1, 2, 3, 4, 5, 6, 7, 8),
c("A", "B", "C", "D", "A", "B", "A", "C"),
c("B", "E", "D", "A", "C", "A", "D", "A"))
colnames(df1) <- c("ID", "Number1", "Number2")
> df1
ID Number1 Number2
1 1 A B
2 2 B E
3 3 C D
4 4 D A
5 5 A C
6 6 B A
7 7 A D
8 8 C A
# Dataframe 2: Matrix of observations I am interested in
df2 <- matrix(c("A", "B",
"D", "A",
"C", "B",
"E", "D"),
ncol = 2,
byrow = TRUE)
> df2
[,1] [,2]
[1,] "A" "B"
[2,] "D" "A"
[3,] "C" "B"
[4,] "E" "D"
我要完成的工作是在df1中创建一个新列,仅当df2中存在确切的组合时才声明TRUE(例如ID = 1等于df2中的第一行,因为它们都由A组成和B)。另外,如果有捷径,我也希望数字反转时状态为TRUE,即df1 $ Number1匹配df2 [i,2],而df1 $ Number2匹配df2 [i,1](例如ID) = 7,则df1中的组合为A,D,而df2中的组合为D,A-> TRUE)。
我想要的输出如下:
> df1
ID Number1 Number2 Status
1 1 A B TRUE
2 2 B E FALSE
3 3 C D FALSE
4 4 D A TRUE
5 5 A C FALSE
6 6 B A TRUE
7 7 A D TRUE
8 8 C A FALSE
到目前为止,我所得到的是:
for (i in 1:nrow(df1)) {
for (j in 1:nrow(df2)) {
Status <- ifelse(df1$Number1[i] %in% df2[j,1] &&
df1$Number2[i] %in% df2[j,2], TRUE, FALSE)
StatusComb[i,j] <- Status
}
df1$Status[i] <- ifelse(any(StatusComb[i,]) == TRUE, TRUE, FALSE)
}
这真的效率低下(您可以清楚地告诉我R是新手),而且看起来也不是很好。我将不胜感激!
答案 0 :(得分:0)
一种方法是将事物merge
在一起。
为了适应您的数据,以解决标签颠倒的问题,我将在其自身上颠倒df2
并进行查找:
df2 <- rbind.data.frame(df2, df2[,c(2,1)])
colnames(df2) <- c("Number1", "Number2")
df2$a <- TRUE
df2
# Number1 Number2 a
# 1 A B TRUE
# 2 D A TRUE
# 3 C B TRUE
# 4 E D TRUE
# 5 B A TRUE
# 6 A D TRUE
# 7 B C TRUE
# 8 D E TRUE
我添加了a
,以便将其合并。
df3 <- merge(df1, df2, all.x = TRUE)
df3$a <- !is.na(df3$a)
df3[ order(df3$ID), ]
# Number1 Number2 ID a
# 1 A B 1 TRUE
# 5 B E 2 FALSE
# 7 C D 3 FALSE
# 8 D A 4 TRUE
# 2 A C 5 FALSE
# 4 B A 6 TRUE
# 3 A D 7 TRUE
# 6 C A 8 FALSE
如果您在!is.na(df3$a)
之前查看此列,则会看到该列完全是TRUE
和NA
({{1}中没有NA
});如果这足以满足您的需要,则可以省略中间步骤。 df2
步骤只是因为不能保证与order
的行顺序(实际上,我发现它总是不方便地与众不同)。由于先前是merge
订购的产品,因此我将其恢复为原来的状态,但这完全是为了美观,以匹配您所需的输出。
答案 1 :(得分:0)
您可以按以下字母顺序定义要搜索的combination
变量:
combination <- apply(df2, 1, function(x) {
paste(sort(x), collapse = '')
})
combination
[1] "AB" "AD" "BC" "DE"
然后根据“数字”字段的串联来更改“状态”字段
library(dplyr)
df1 %>%
rowwise() %>%
mutate(S = paste(sort(c(Number1, Number2)), collapse = "")) %>%
mutate(Status = ifelse(S %in% combination, TRUE, FALSE))
Source: local data frame [8 x 5]
Groups: <by row>
# A tibble: 8 x 5
ID Number1 Number2 S Status
<dbl> <chr> <chr> <chr> <lgl>
1 1 A B AB TRUE
2 2 B E BE FALSE
3 3 C D CD FALSE
4 4 D A AD TRUE
5 5 A C AC FALSE
6 6 B A AB TRUE
7 7 A D AD TRUE
8 8 C A AC FALSE
我在数据框中设置了stringsAsFactors = F
df1 <- data.frame(c(1, 2, 3, 4, 5, 6, 7, 8),
c("A", "B", "C", "D", "A", "B", "A", "C"),
c("B", "E", "D", "A", "C", "A", "D", "A"), stringsAsFactors = F)
colnames(df1) <- c("ID", "Number1", "Number2")