I want to compare two string vectors as follows:
Test1<-c("Everything is normal","It is all sunny","Its raining cats and dogs","Mild")
Test2<-c("Everything is normal","It is thundering","Its raining cats and dogs","Cloudy")
Filtered<-data.frame(Test1,Test2)
预期输出:
Number the same: 2
Number present in Test1 and not in Test2: 2
Number present in Test2 and not in Test1: 2
我还想看看哪些字符串不同,以便其他预期输出应如下(并且也是原始数据帧的一部分)
Same<-c("Everything is normal","Its raining cats and dogs")
OnlyInA<-c("It is all sunny")
OnlyInB<-c("It is thundering","Cloudy")
我尝试过:
Filtered$Same<-intersect(Filtered$A,Filtered$B)
Filtered$InAButNotB<-setdiff(Filtered$A,Filtered$B)
但是当我尝试最后一行时,我得到错误替换有127行,数据有400(如果我使用更长的数据集)。
我想这是因为我只返回有差异的行,所以列不匹配。我如何NA
与setdiff没有差异的行,以便我可以将其保留在原始数据框中?
答案 0 :(得分:1)
基础R outer
函数将函数应用于两个向量的每个元素的每个组合。因此,将outer
与'=='
一起使用会比较每个向量的每个元素:
Test1<-c("Everything is normal","It is all sunny","Its raining cats and dogs")
Test2<-c("Everything is normal","It is thundering","Its raining cats and dogs","Cloudy")
# test each element in Test1 for equality with each element in Test2
compare <- outer(Test1, Test2, '==')
# calculate overlaps and uniques
overlaps <- sum(compare) # number of overlaps: 2
unique.test1 <- (rowSums(compare) == 0) # in Test1 but not Test2
unique.test2 <- (colSums(compare) == 0) # in Test2 but not Test1
# return uniques
OnlyInA <- Test1[unique.test1]
OnlyInB <- Test2[unique.test2]
same <- Test1[rowSums(compare) == 1]
# counts
n.unique.a <- sum(unique.test1)
n.unique.b <- sum(unique.test2)
或者,%in%
运算符也适用于此类事物:
Test1[Test1 %in% Test2]
[1] "Everything is normal" "Its raining cats and dogs"
Test1[!(Test1 %in% Test2)]
[1] "It is all sunny"
Test2[!(Test2 %in% Test1)]
[1] "It is thundering" "Cloudy"
答案 1 :(得分:0)
使用tidyverse
函数,您可以尝试类似:
Filtered %>%
summarise(comm = sum(Test1 %in% Test2),
InA = sum(!(Test1 %in% Test2)),
InB = sum(!(Test2 %in% Test1)))
虽然,对于处理向量,如果您只对聚合计数感兴趣,您也可以尝试以下方法
length(intersect(Test1,Test2))
length(setdiff(Test1,Test2))