我试图剥离其data.frame结构的两个数据帧,提取每个data.frame中的元素,并将数据帧中提取的数据合并到一个data.frame中。这应该导致data.frame由两列作为向量组成。请参阅下面的输出(以粗体标记)。
问题:输出包含多个data.frame元素,而不是包含输入数据框中矢量的单个data.frame。
每个数据框都有一个向量。
[EDIT ^ v回应评论。]
到目前为止,我尝试了as()
和unlist()
的各种组合无济于事......
我正在尝试使用内置R函数和向量化来解决此问题(不使用plyr
和loops
:Merge several data.frames into one data.frame with a loop,Merge many data frames from csv files,Recombining a list of Data.frames into a single data frame )
可重现的代码:我无法复制错误,但这是我希望我的代码能够正常工作的方式:
df1<-data.frame<-c(1, 2, 3)
df2<-data.frame<-c(2, 4, 6)
output<-cbind(df1, df2)
print(output) #Returns a data.frame
str(output) # of vectors
#In my case however, a data.frame returns data.frames)
返回:
df1 df2
[1,] 1 2
[2,] 2 4
[3,] 3 6
现实:
readmultiple <- function(directory = "bigdata") {
....
....
....
output <- cbind.data.frame(filename, readmultiplesum)
# This is probably where things go wrong
return(output)
}
output <- lapply(filenames, complete.cases.sum)
assign("Global.output", output, envir = .GlobalEnv)
# There is probably a better way to do this too
if (firstoutput == 1) {
Global.output <- merge(as(unlist(Global.output[1]), "vector"),
as(unlist(output[1])), "vector")
# as, unlist... Not sure what's needed here
} else {
firstoutput <- 1
}
str(output)
return(Global.output)
}
输出看起来像
[[1]]
filename result
1 142
[[2]]
filename result
1 521
[[3]]
filename result
1 324
但我希望它是
filename result
[1,] filename[i] 142
[2,] filename[i] 521
[3,] filename[i] 324
...其中filename [i]是文件名的索引。
str(输出)返回
List of 2400
$ :'data.frame': 1 obs. of 2 variables:
..$ filename : Factor w/ 1 level "bigdata/001.csv": 1
..$ sumrows: num 142
$ :'data.frame': 1 obs. of 2 variables:
..$ filename : Factor w/ 1 level "bigdata/001.csv": 1
..$ sumrows: num 521
$ :'data.frame': 1 obs. of 2 variables:
..$ filename : Factor w/ 1 level "bigdata/001.csv": 1
..$ sumrows: num 324
$ :'data.frame': 1 obs. of 2 variables:
..$ filename : Factor w/ 1 level "bigdata/001.csv": 1
.....
dput(head(output))返回
list(structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 142), .Names = c("filename", "sumrows"), row.names = c(NA,
-1L), class = "data.frame"), structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 521), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 324), .Names = c("filename", "sumrows"), row.names = c(NA,
-1L), class = "data.frame"), structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 1896), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 1608), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 912), .Names = c("filename", "sumrows"), row.names = c(NA,
-1L), class = "data.frame"))
答案 0 :(得分:1)
将列表更改为data.frame的一般技巧是使用do.call
ll <- list(c(filename=1 ,result=142 ),c(filename=2 ,result=521 ))
> do.call(rbind,ll)
filename result
[1,] 1 142
[2,] 2 521
当我将此应用到您的列表中时,我得到:
do.call(rbind,ll)
filename sumrows
1 bigdata/001.csv 142
2 bigdata/001.csv 521
3 bigdata/001.csv 324
4 bigdata/001.csv 1896
5 bigdata/001.csv 1608
6 bigdata/001.csv 912
不幸的是,你不确切知道什么是文件名[i]?
修改强>
此解决方案似乎适用于OP:
library(plyr)
ldply(ll)
通常你可以使用:
ldply(ll,function(x){
##you process the row x here
}
)