如何比较两个CSV文件并将非共享项写入R中的CSV文件?

时间:2014-02-28 19:07:15

标签: r csv

我有两个CSV文件,例如

CSVfile1.csv

Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand

CSVfile2.csv

Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida

所需的输出

Name,Identity,Location
Coconut,87,Wisconsin
Orange,48,Florida

在R中有直接功能吗? R的新手,感谢任何帮助。

3 个答案:

答案 0 :(得分:3)

你在R中有很多选择。在基础R中,ususllay我们使用mergematch

另一种方法是使用dplyr包。

library(dplyr)
## create sources from data frames
xx_src = tbl_df(xx)
yy_src = tbl_df(yy)
## to get shared items
inner_join(xx_src,yy_src)
    Name Identity    Location
1  Apple       45 Los Angeles
2 Banana       78    Kingston

## to get non shared items 
anti_join(xx_src,yy_src)
     Name Identity Location
1 Coconut       87 Thailand

其中:

xx <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand",header=TRUE,sep=',')

yy <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida",header=TRUE,sep=',')

答案 1 :(得分:1)

试试这个:

Lines1 <- readLines("CSVfile1.csv")
Lines2 <- readLines("CSVfile2.csv")
LinesDiff <- setdiff(Lines2, Lines1)
writeLines(c(Lines[1], LinesDiff), "CSVfileDiff.csv")

这给出了:

> readLines("CSVfileDiff.csv")
[1] "Name,Identity,Location" "Coconut,87,Wisconsin"   "Orange,48,Florida"

答案 2 :(得分:0)

xx <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand",header=TRUE,sep=',')

yy <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida",header=TRUE,sep=',')


x <- rbind(yy, xx)
x[! duplicated(x, fromLast=TRUE) & seq(nrow(x)) <= nrow(yy), ]

输出:

         Name Identity  Location
3     Coconut       87 Wisconsin
4      Orange       48   Florida

归功于马特: R selecting all rows from a data frame that don't appear in another