我正在尝试拆分大型数据集和
使用循环和
将所有单个数据再次保存在单个堆叠文件中
我正在使用如下的一些示例数据:
首先,我根据第一列中的源数将数据集拆分为2,并使用以下代码读入列表:
out <- split( sample , f = sample$Source)
现在我正在努力设置一个循环来更改coloumn 2到8的colnames 通过将现有的colnames与以下'info'表匹配,并根据'info'表的第一列中的源名称进行替换。
信息表如下所示:
我只是想知道是否有人做过类似的事可以告诉我?
当我尝试将它们连接在一起时,我只能使用merge函数设置colnames。是否有任何方法来堆叠它们,以便我可以保留每个表的colname,看起来像这样? :
我的示例输入文件是:
> dput(sample)
structure(list(Source = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L), .Label = c("Stack 1", "Stack 2"), class = "factor"),
year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L), day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), hour = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 3L, 4L), `EXIT VEL` = c(26.2,
26.2, 26.2, 26.2, 22.4, 22.4, 22.4, 22.4, 22.4), TEMP = c(341L,
341L, 341L, 341L, 328L, 328L, 328L, 328L, 328L), `STACK DIAM` = c(1.5,
1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5), W = c(0L, 0L, 0L,
0L, 15L, 15L, 15L, 15L, 15L), Nox = c(39, 39, 39, 39, 33.3,
33.3, 33.3, 33.3, 33.3), Sox = c(15.5, 15.5, 15.5, 15.5,
17.9, 17.9, 17.9, 17.9, 17.9)), .Names = c("Source", "year",
"day", "hour", "EXIT VEL", "TEMP", "STACK DIAM", "W", "Nox",
"Sox"), class = "data.frame", row.names = c(NA, -9L))
> dput(stack_info)
structure(list(SNAME = structure(1:2, .Label = c("Stack 1", "Stack 2"
), class = "factor"), ISVARY = c(1L, 4L), VELVOL = c(1L, 4L),
TEMPDENS = c(0L, 2L), `DUM 1` = c(999L, 999L), `DUM 2` = c(999L,
999L), NPOL = c(2L, 2L), `EXIT VEL` = c(26.2, 22.4), TEMP = c(341L,
328L), `STACK DIAM` = c(1.5, 2.5), W = c(0L, 15L), Nox = c(39,
33.3), Sox = c(15.5, 17.9)), .Names = c("SNAME", "ISVARY",
"VELVOL", "TEMPDENS", "DUM 1", "DUM 2", "NPOL", "EXIT VEL", "TEMP",
"STACK DIAM", "W", "Nox", "Sox"), class = "data.frame", row.names = c(NA,
-2L))
提前致谢
答案 0 :(得分:1)
我最好的结果就是:
out <- split( sample , f = sample$Source) # your original step
stack_info[,1] <- as.character(stack_info[,1]) # To get strings column as strings and not index number later
out <- lapply( names(out), function(x) {
# Get the future names
new_cnames <- unname(unlist(stack_info[stack_info$SNAME == x,1:7]))
# replace the column names
colnames(out[[x]]) <- c("Source",new_cnames,colnames(out[[x]])[9:10] )
# Return the modified version without first column
out[[x]][,-1] })
sapply(out,write.table,append=T,file="",row.names=F,sep="|") # write (change "" to the file name you wish and sep to your desired separator and see ?write.table for more documentation)
主要思想是循环DF以更改其colnames,我会更新列表并再次循环写入,您可能希望在第一个循环中附加到文件。
我希望这些评论足以获取代码,告诉我它是否需要一些细节。
屏幕输出(省略警告):
"Stack 1"|"1"|"1.1"|"0"|"999"|"999.1"|"2"|"Nox"|"Sox"
2010|1|0|26.2|341|1.5|0|39|15.5
2010|1|1|26.2|341|1.5|0|39|15.5
2010|1|2|26.2|341|1.5|0|39|15.5
2010|1|3|26.2|341|1.5|0|39|15.5
"Stack 2"|"4"|"4.1"|"2"|"999"|"999.1"|"2.1"|"Nox"|"Sox"
2010|1|0|22.4|328|2.5|15|33.3|17.9
2010|1|1|22.4|328|2.5|15|33.3|17.9
2010|1|2|22.4|328|2.5|15|33.3|17.9
2010|1|3|22.4|328|2.5|15|33.3|17.9
2010|1|4|22.4|328|2.5|15|33.3|17.9