我有两个CSV文件,例如
CSVfile1.csv
Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand
CSVfile2.csv
Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida
所需的输出
Name,Identity,Location
Coconut,87,Wisconsin
Orange,48,Florida
在R中有直接功能吗? R的新手,感谢任何帮助。
答案 0 :(得分:3)
你在R中有很多选择。在基础R中,ususllay我们使用merge
或match
。
另一种方法是使用dplyr
包。
library(dplyr)
## create sources from data frames
xx_src = tbl_df(xx)
yy_src = tbl_df(yy)
## to get shared items
inner_join(xx_src,yy_src)
Name Identity Location
1 Apple 45 Los Angeles
2 Banana 78 Kingston
## to get non shared items
anti_join(xx_src,yy_src)
Name Identity Location
1 Coconut 87 Thailand
其中:
xx <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand",header=TRUE,sep=',')
yy <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida",header=TRUE,sep=',')
答案 1 :(得分:1)
试试这个:
Lines1 <- readLines("CSVfile1.csv")
Lines2 <- readLines("CSVfile2.csv")
LinesDiff <- setdiff(Lines2, Lines1)
writeLines(c(Lines[1], LinesDiff), "CSVfileDiff.csv")
这给出了:
> readLines("CSVfileDiff.csv")
[1] "Name,Identity,Location" "Coconut,87,Wisconsin" "Orange,48,Florida"
答案 2 :(得分:0)
xx <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand",header=TRUE,sep=',')
yy <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida",header=TRUE,sep=',')
x <- rbind(yy, xx)
x[! duplicated(x, fromLast=TRUE) & seq(nrow(x)) <= nrow(yy), ]
输出:
Name Identity Location
3 Coconut 87 Wisconsin
4 Orange 48 Florida
归功于马特: R selecting all rows from a data frame that don't appear in another