Question

I want to compare two string vectors as follows:

Test1<-c("Everything is normal","It is all sunny","Its raining cats and dogs","Mild")

Test2<-c("Everything is normal","It is thundering","Its raining cats and dogs","Cloudy")

Filtered<-data.frame(Test1,Test2)

预期输出：

Number the same: 2
Number present in Test1 and not in Test2: 2
Number present in Test2 and not in Test1: 2

我还想看看哪些字符串不同，以便其他预期输出应如下（并且也是原始数据帧的一部分）

Same<-c("Everything is normal","Its raining cats and dogs")
OnlyInA<-c("It is all sunny")
OnlyInB<-c("It is thundering","Cloudy")

我尝试过：

Filtered$Same<-intersect(Filtered$A,Filtered$B)
Filtered$InAButNotB<-setdiff(Filtered$A,Filtered$B)

但是当我尝试最后一行时，我得到错误替换有127行，数据有400（如果我使用更长的数据集）。

我想这是因为我只返回有差异的行，所以列不匹配。我如何NA与setdiff没有差异的行，以便我可以将其保留在原始数据框中？

Answer 1

基础R outer函数将函数应用于两个向量的每个元素的每个组合。因此，将outer与'=='一起使用会比较每个向量的每个元素：

Test1<-c("Everything is normal","It is all sunny","Its raining cats and dogs")
Test2<-c("Everything is normal","It is thundering","Its raining cats and dogs","Cloudy")

# test each element in Test1 for equality with each element in Test2
compare <- outer(Test1, Test2, '==') 

# calculate overlaps and uniques
overlaps <- sum(compare) # number of overlaps: 2
unique.test1 <- (rowSums(compare) == 0) # in Test1 but not Test2
unique.test2 <- (colSums(compare) == 0) # in Test2 but not Test1

# return uniques
OnlyInA <- Test1[unique.test1]
OnlyInB <- Test2[unique.test2]
same <- Test1[rowSums(compare) == 1]

# counts
n.unique.a <- sum(unique.test1)
n.unique.b <- sum(unique.test2)

或者，%in%运算符也适用于此类事物：

Test1[Test1 %in% Test2]
[1] "Everything is normal"      "Its raining cats and dogs"

Test1[!(Test1 %in% Test2)]
[1] "It is all sunny"

Test2[!(Test2 %in% Test1)]
[1] "It is thundering" "Cloudy"

Answer 2

使用tidyverse函数，您可以尝试类似：

Filtered %>%
  summarise(comm = sum(Test1 %in% Test2),
            InA = sum(!(Test1 %in% Test2)),
            InB = sum(!(Test2 %in% Test1)))

虽然，对于处理向量，如果您只对聚合计数感兴趣，您也可以尝试以下方法

length(intersect(Test1,Test2))
length(setdiff(Test1,Test2))

如何比较两个字符串向量之间匹配的语句数

2 个答案: