Question

我对我在＆＃34; setdiff＆＃34;中遇到的问题有疑问。 R中的命令。

我已使用以下命令将2个excel文件加载到R中：

data.x<- read_excel("c:/Users/User/Dropbox/excel til R/X.xlsx", col_names=FALSE)
data.y<- read_excel("c:/Users/User/Dropbox/excel til R/Y.xlsx", col_names=FALSE)

然后我继续使用以下命令：

setdiff(data.y, data.x)

我希望它能告诉我＆＃34; data.x＆＃34;中没有哪些变量。但它只是向我展示了数据中出现的数据，就像我只是使用了命令一样：＆＃34; data.y＆＃34;。

我做错了什么或者我错过了什么？

非常感谢任何帮助。

Answer 1

尝试使用dplyr::setdiff()而不是setdiff()。基数R中还有另一个setdiff()函数。

Answer 2

如果无法访问问题中提到的文件，我将使用

提供示例

mtcars_1 <- mtcars[-1, ][, -1]
mtcars_2 <- mtcars[-2, ][, -2]

# common column names
intersect(names(mtcars_1), names(mtcars_2))

# columns names only in mtcars_0 and not in mtcars_2
setdiff(names(mtcars_1), names(mtcars_2))

# columns only in mtcars_2 and not in mtcars_1 
setdiff(names(mtcars_2), names(mtcars_1))

要获得data.x中与data.y相关的可用变量 setdiff(data.y, data.x)

如果您要查找不在data.x data.y的{{1}}的行，请考虑使用 anti_join和/或semi_join函数构成dplyr包

‘semi_join()’ return all rows from ‘x’ where there are matching
     values in ‘y’, keeping just columns from ‘x’.

     A semi join differs from an inner join because an inner join
     will return one row of ‘x’ for each matching row of ‘y’, where a
     semi join will never duplicate rows of ‘x’.

‘anti_join()’ return all rows from ‘x’ where there are not matching
     values in ‘y’, keeping just columns from ‘x’.

dplyr::semi_join(mtcars_1, mtcars_2)
dplyr::semi_join(mtcars_2, mtcars_1)

dplyr::anti_join(mtcars_1, mtcars_2)
#   cyl disp  hp drat    wt  qsec vs am gear carb
# 1   6  160 110  3.9 2.875 17.02  0  1    4    4

dplyr::anti_join(mtcars_2, mtcars_1)
#   mpg disp  hp drat   wt  qsec vs am gear carb
# 1  21  160 110  3.9 2.62 16.46  0  1    4    4

＆＃34; setdiff＆＃34;的问题R中的命令

2 个答案: