如何以最佳方式合并两个数据帧?

时间:2015-10-27 15:18:54

标签: r merge

我需要合并到data.frame s:

dput(data)

    structure(list(Hostname = structure(c(8L, 8L, 9L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L), .Label = c("db01", "db02", "farm01", "farm02", 
"tom01", "tom02", "tom03", "web01", "web03"), class = "factor"), 
    Date = structure(c(6L, 10L, 5L, 3L, 2L, 1L, 8L, 9L, 7L, 4L
    ), .Label = c("10/5/2015 1:15", "10/5/2015 1:30", "10/5/2015 2:15", 
    "10/5/2015 4:30", "10/5/2015 8:30", "10/5/2015 8:45", "10/6/2015 8:15", 
    "10/6/2015 8:30", "9/11/2015 5:00", "9/11/2015 6:00"), class = "factor"), 
    Cpubusy = c(31L, 20L, 30L, 20L, 18L, 20L, 41L, 21L, 29L, 
    24L), UsedPercentMemory = c(99L, 98L, 95L, 99L, 99L, 99L, 
    99L, 98L, 63L, 99L)), .Names = c("Hostname", "Date", "Cpubusy", 
"UsedPercentMemory"), class = "data.frame", row.names = c(NA, 
-10L))
dput(cmdb)

structure(list(Hostname = structure(c(8L, 8L, 9L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L), .Label = c("db01", "db02", "farm01", "farm02", 
"tom01", "tom02", "tom03", "web01", "web03"), class = "factor"), 
    App = structure(c(4L, 4L, 4L, 3L, 3L, 3L, 1L, 1L, 2L, 2L), .Label = c("DB", 
    "FARM", "Tom", "WEB"), class = "factor"), HA = structure(c(3L, 
    4L, 5L, 2L, 6L, 6L, 1L, 6L, 6L, 6L), .Label = c("hadb02", 
    "hatom", "haweb01", "haweb02", "haweb03", "No HA Host"), class = "factor"), 
    DR = structure(c(3L, 4L, 5L, 2L, 6L, 6L, 1L, 6L, 6L, 6L), .Label = c("drdb01", 
    "drtom", "drweb01", "drweb02", "drweb03", "No DR Host"), class = "factor"), 
    Date = structure(c(6L, 10L, 5L, 3L, 2L, 1L, 8L, 9L, 7L, 4L
    ), .Label = c("10/5/2015 1:15", "10/5/2015 1:30", "10/5/2015 2:15", 
    "10/5/2015 4:30", "10/5/2015 8:30", "10/5/2015 8:45", "10/6/2015 8:15", 
    "10/6/2015 8:30", "9/11/2015 5:00", "9/11/2015 6:00"), class = "factor"), 
    Cpubusy = c(31L, 20L, 30L, 20L, 18L, 20L, 41L, 21L, 29L, 
    24L), UsedPercentMemory = c(99L, 98L, 95L, 99L, 99L, 99L, 
    99L, 98L, 63L, 99L)), .Names = c("Hostname", "App", "HA", 
"DR", "Date", "Cpubusy", "UsedPercentMemory"), class = "data.frame", row.names = c(NA, 
-10L))

我这样做:

  env<-unique(colnames(cmdb[,c(2:4)]))
  env<-as.factor(env)
  env<-droplevels(env)
   for(en in env){

        mergedData <- merge(data, cmdb, by.x=c("Hostname"),by.y=en,all=T)
        ##do some other stuff here based on if it is prod, ha or dr
   }

看起来这条线需要很长时间才能完成:

mergedData <- merge(data, cmdb, by.x=c("Hostname"),by.y=en,all=T)

是否有其他方法可以根据不同的列名合并两个数据框,并且只根据每个env包含这些数据框:

App, Env, Date, Cpubusy, UsedPercentMemory

0 个答案:

没有答案