两个不同的数据集之间哪些变量不同?

时间:2020-04-03 17:32:17

标签: r variables difference

因此,我有两个不同的数据集(df_test和df_test_2),我希望能够区分它们之间的区别。我知道我可以使用setdiff来区分它们之间的区别,但是我需要知道它们之间的区别。例如,我要查看var_1不同且其他所有内容都相同的行,并且我还想对var_2和所有其他变量进行相同的分析。无论如何,我可以为这些变量设置标志吗?

df_test <- data.frame(id = c(0,1,1,1,2,3,4,5,5)
                          , var_1 = c("h","a","b","b","c","d","e","f","m"),
                          var_2=c("companyf","companyA","companyB","companyc","companyD","companyf","companyg","companyh","companyi"),
                          var_3=c(100,10,10,11,20,30,40,50,5))

    df_test_2= data.frame(id = c(1,1,1,2,3,4,5,5,6,6)
                             , var_1 = c("a","b","b","c","d","e","f","g","h","i"),
                             var_2=c("companyA","companyBB","companyc","companyD","companyf","companyg","companyh","companyi","companyii","companyff"),
                             var_3=c(10,10,11,200,30,40,50,5,40,20))

1 个答案:

答案 0 :(得分:0)

我们可以尝试使用setdiffwhich

修改您的方法

要搜索df1中存在但df2中不存在的var_1值

df1[which(df1$var_1 == setdiff(df1$var_1,df2$var_1)),]

var_1的输出

  id var_1    var_2 var_3
9  5     m companyi     5

要搜索df1中存在但df2中不存在的var_2值

df1[which(df1$var_2 == setdiff(df1$var_2,df2$var_2)),]

var_2的输出

id var_1    var_2 var_3
3  1     b companyB    10