给定两个数据框
old.df = data.frame(SampleNo=c('A1', 'B4', 'C5', 'D4'), Result=c(rep("Successful",4)), NoUnit = c(rep(4,4)))
new.df = data.frame(SampleNo=c('A1', 'C5', 'D4', 'E4'), Result=c(rep("Successful",2),rep( "Failure",2)),State=c(rep("California",2),rep("New York",2)))
使其具有以下格式:
> old.df
SampleNo Result NoUnit
1 A1 Successful 4
2 B4 Successful 4
3 C5 Successful 4
4 D4 Successful 4
> new.df
SampleNo Result State
1 A1 Successful California
2 C5 Successful California
3 D4 Failure New York
4 E4 Failure New York
我想用new.df中的新数据更新old.df的内容,维护old.df的行继承并从new.df添加新列。生成的data.frame将是:
SampleNo Result NoUnit State
1 A1 Successful 4 California
2 B4 Successful 4 <NA>
3 C5 Successful 4 California
4 D4 Failure 4 New York
5 E4 Failure NA New York
答案 0 :(得分:3)
merge(old.df,new.df,all=TRUE)
SampleNo Result NoUnit State
1 A1 Successful 4 California
2 B4 Successful 4 <NA>
3 C5 Successful 4 California
4 D4 Failure 4 New York
5 E4 Failure NA New York
在OP更改规则后进行编辑:
df <- merge(old.df,new.df,all=TRUE,by="SampleNo")
df$Result <- with(df,factor(ifelse(is.na(Result.y),
as.character(Result.x),as.character(Result.y))))
df$Result.x <- NULL; df$Result.y <- NULL
SampleNo NoUnit State Result
1 A1 4 California Successful
2 B4 4 <NA> Successful
3 C5 4 California Successful
4 D4 4 New York Failure
5 E4 NA New York Failure
答案 1 :(得分:1)
合并本身不会这样做。但是你真的不想在"Result"
列上合并,只在"SampleNo"
列上合并,然后合并"Result"
值,使用新值(如果可用),否则为旧值。
对于"SampleNo"
merge.by.sample <- function(old.df, new.df, by='SampleNo') {
r <- merge(old.df, new.df,all=T,by=by)
merge.col <- function(r, col) {
xname <- paste0(col, '.x')
yname <- paste0(col, '.y')
r[col] <- factor(r[,yname], levels=union(levels(r[,xname]), levels(r[,yname])))
r[col][is.na(r[col])] <- r[xname][is.na(r[col])]
r[!(names(r) %in% c(xname, yname))]
}
i <- intersect(names(old.df), names(new.df))
i <- i[!i %in% by]
for (col in i) {
r <- merge.col(r, col)
}
r
}