R - 使用包含来自另一个表

时间:2015-09-16 19:22:27

标签: r

我正在尝试拆分大型数据集和

  1. 使用循环和

  2. 分配colnames
  3. 将所有单个数据再次保存在单个堆叠文件中

  4. 我正在使用如下的一些示例数据:

    enter image description here

    首先,我根据第一列中的源数将数据集拆分为2,并使用以下代码读入列表:

    out <- split( sample , f = sample$Source)
    

    现在我正在努力设置一个循环来更改coloumn 2到8的colnames 通过将现有的colnames与以下'info'表匹配,并根据'info'表的第一列中的源名称进行替换。

    信息表如下所示:

    enter image description here

    所以循环应该更改类似于此的colnames: enter image description here

    我只是想知道是否有人做过类似的事可以告诉我?

    当我尝试将它们连接在一起时,我只能使用merge函数设置colnames。是否有任何方法来堆叠它们,以便我可以保留每个表的colname,看起来像这样? :

    enter image description here

    我的示例输入文件是:

    > dput(sample)
    structure(list(Source = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L), .Label = c("Stack 1", "Stack 2"), class = "factor"), 
        year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 
        2010L, 2010L), day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
        ), hour = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 3L, 4L), `EXIT VEL` = c(26.2, 
        26.2, 26.2, 26.2, 22.4, 22.4, 22.4, 22.4, 22.4), TEMP = c(341L, 
        341L, 341L, 341L, 328L, 328L, 328L, 328L, 328L), `STACK DIAM` = c(1.5, 
        1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5), W = c(0L, 0L, 0L, 
        0L, 15L, 15L, 15L, 15L, 15L), Nox = c(39, 39, 39, 39, 33.3, 
        33.3, 33.3, 33.3, 33.3), Sox = c(15.5, 15.5, 15.5, 15.5, 
        17.9, 17.9, 17.9, 17.9, 17.9)), .Names = c("Source", "year", 
    "day", "hour", "EXIT VEL", "TEMP", "STACK DIAM", "W", "Nox", 
    "Sox"), class = "data.frame", row.names = c(NA, -9L))
    
    > dput(stack_info)
    structure(list(SNAME = structure(1:2, .Label = c("Stack 1", "Stack 2"
    ), class = "factor"), ISVARY = c(1L, 4L), VELVOL = c(1L, 4L), 
        TEMPDENS = c(0L, 2L), `DUM 1` = c(999L, 999L), `DUM 2` = c(999L, 
        999L), NPOL = c(2L, 2L), `EXIT VEL` = c(26.2, 22.4), TEMP = c(341L, 
        328L), `STACK DIAM` = c(1.5, 2.5), W = c(0L, 15L), Nox = c(39, 
        33.3), Sox = c(15.5, 17.9)), .Names = c("SNAME", "ISVARY", 
    "VELVOL", "TEMPDENS", "DUM 1", "DUM 2", "NPOL", "EXIT VEL", "TEMP", 
    "STACK DIAM", "W", "Nox", "Sox"), class = "data.frame", row.names = c(NA, 
    -2L))
    

    提前致谢

1 个答案:

答案 0 :(得分:1)

我最好的结果就是:

out <- split( sample , f = sample$Source) # your original step

stack_info[,1] <- as.character(stack_info[,1]) # To get strings column as strings and not index number later
out <- lapply( names(out), function(x) {
                      # Get the future names
                      new_cnames <- unname(unlist(stack_info[stack_info$SNAME == x,1:7]))
                      # replace the column names
                      colnames(out[[x]]) <- c("Source",new_cnames,colnames(out[[x]])[9:10] )
                      # Return the modified version without first column
                      out[[x]][,-1]  })

sapply(out,write.table,append=T,file="",row.names=F,sep="|") # write (change "" to the file name you wish and sep to your desired separator and see ?write.table for more documentation)

主要思想是循环DF以更改其colnames,我会更新列表并再次循环写入,您可能希望在第一个循环中附加到文件。

我希望这些评论足以获取代码,告诉我它是否需要一些细节。

屏幕输出(省略警告):

 "Stack 1"|"1"|"1.1"|"0"|"999"|"999.1"|"2"|"Nox"|"Sox"
2010|1|0|26.2|341|1.5|0|39|15.5
2010|1|1|26.2|341|1.5|0|39|15.5
2010|1|2|26.2|341|1.5|0|39|15.5
2010|1|3|26.2|341|1.5|0|39|15.5
"Stack 2"|"4"|"4.1"|"2"|"999"|"999.1"|"2.1"|"Nox"|"Sox"
2010|1|0|22.4|328|2.5|15|33.3|17.9
2010|1|1|22.4|328|2.5|15|33.3|17.9
2010|1|2|22.4|328|2.5|15|33.3|17.9
2010|1|3|22.4|328|2.5|15|33.3|17.9
2010|1|4|22.4|328|2.5|15|33.3|17.9