使用R中的不同列从另一个data.frame更新data.frame

时间:2012-11-23 08:00:02

标签: r dataframe merging-data

给定两个数据框

old.df = data.frame(SampleNo=c('A1', 'B4', 'C5', 'D4'), Result=c(rep("Successful",4)), NoUnit = c(rep(4,4)))
new.df = data.frame(SampleNo=c('A1', 'C5', 'D4', 'E4'), Result=c(rep("Successful",2),rep( "Failure",2)),State=c(rep("California",2),rep("New York",2)))

使其具有以下格式:

> old.df
  SampleNo     Result      NoUnit
1       A1     Successful      4
2       B4     Successful      4
3       C5     Successful      4
4       D4     Successful      4


> new.df
  SampleNo     Result      State
1       A1 Successful California
2       C5 Successful California
3       D4    Failure   New York
4       E4    Failure   New York

我想用new.df中的新数据更新old.df的内容,维护old.df的行继承并从new.df添加新列。生成的data.frame将是:

 SampleNo     Result   NoUnit      State
1       A1 Successful      4 California
2       B4 Successful      4       <NA>
3       C5 Successful      4 California
4       D4    Failure      4   New York
5       E4    Failure     NA   New York

2 个答案:

答案 0 :(得分:3)

merge(old.df,new.df,all=TRUE)

  SampleNo     Result NoUnit      State
1       A1 Successful      4 California
2       B4 Successful      4       <NA>
3       C5 Successful      4 California
4       D4    Failure      4   New York
5       E4    Failure     NA   New York

在OP更改规则后进行编辑:

df <- merge(old.df,new.df,all=TRUE,by="SampleNo")
df$Result <- with(df,factor(ifelse(is.na(Result.y),
                             as.character(Result.x),as.character(Result.y))))
df$Result.x <- NULL; df$Result.y <- NULL

  SampleNo NoUnit      State     Result
1       A1      4 California Successful
2       B4      4       <NA> Successful
3       C5      4 California Successful
4       D4      4   New York    Failure
5       E4     NA   New York    Failure

答案 1 :(得分:1)

合并本身不会这样做。但是你真的不想在"Result"列上合并,只在"SampleNo"列上合并,然后合并"Result"值,使用新值(如果可用),否则为旧值。

对于"SampleNo"

以外的交叉点中的所有列,以下是一些代码
merge.by.sample <- function(old.df, new.df, by='SampleNo') {
  r <- merge(old.df, new.df,all=T,by=by)

  merge.col <- function(r, col) {
    xname <- paste0(col, '.x')
    yname <- paste0(col, '.y')

    r[col] <- factor(r[,yname], levels=union(levels(r[,xname]), levels(r[,yname])))
    r[col][is.na(r[col])] <- r[xname][is.na(r[col])]
    r[!(names(r) %in% c(xname, yname))]
  }

  i <- intersect(names(old.df), names(new.df))
  i <- i[!i %in% by]

  for (col in i) {
    r <- merge.col(r, col)
  }
  r
}