在一个目录中,有一些文件包:
cpu_server01.csv
cpu_server02.csv
cpu_server03.csv
等
我可以阅读文件的内容并将其附加到dflist,如下所示。但我需要在dflist中创建另一列并将文件名放在那里?
path("C:/Server/web/")
#cpu
filenames <- list.files(path, pattern="cpu_*", full.names=TRUE)
dflist <- lapply(filenames, function(i) {
read.csv(i, header=T)
})
我如何将文件的名称添加到每个文件中?
Date Cpu filename
答案 0 :(得分:2)
这应该有效:
for(i in 1:length(dflist))
dflist[[i]]$file_name = filenames[i]
示例:
filenames=c("a","b")
dflist = list(head(mtcars,3),head(mtcars,3))
for(i in 1:length(dflist))
dflist[[i]]$file_name = filenames[i]
输出:
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb file_name
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 a
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 a
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 a
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb file_name
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 b
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 b
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 b
答案 1 :(得分:0)
除Florian's answer之外,还有两种处理这种常见情况的替代方法。
如果您计划将rbind()
个文件放入一个大型数据对象(参见下面的示例),则将文件名复制为单个data.frames的列只会感觉到恕我直言。
如果要在列表中单独保留每个data.frame,您可以适当地命名列表元素,例如,
path <- "."
# get vector of filenames, note that pattern includes the cvs extension
filenames <- list.files(path, pattern = "cpu_.*csv$", full.names = TRUE)
# read files as a list of data.frames
dflist <- lapply(filenames, read.csv, header = TRUE)
# rename list element using file names without path
names(dflist) <- basename(filenames)
请注意,在调用lapply()
时没有必要定义匿名函数,因为lapply()
将无法识别的参数传递给被调用函数。所以,我们可以简明扼要地写出
lapply(filenames, read.csv, header = TRUE)
而不是
lapply(filenames, function(i) read.csv(i, header = TRUE))
现在,dflist
已正确命名
$cpu_server01.csv V1 V2 1 A 1001 2 B 1002 3 C 1003 $cpu_server02.csv V1 V2 1 A 2001 2 B 2002 3 C 2003 $cpu_server03.csv V1 V2 1 A 3001 2 B 3002 3 C 3003
如果目标是将所有数据块组合在一个大型数据对象中,则需要识别每行的原始源文件。
这可以通过Florian's approach和随后的rbinding来实现。或者,我们可以使用data.table
&#39; rbindlist()
函数。
如果列表元素已按上述方式命名,我们只需添加:
combi <- data.table::rbindlist(dflist, idcol = "file.name")
combi
file.name V1 V2 1: cpu_server01.csv A 1001 2: cpu_server01.csv B 1002 3: cpu_server01.csv C 1003 4: cpu_server02.csv A 2001 5: cpu_server02.csv B 2002 6: cpu_server02.csv C 2003 7: cpu_server03.csv A 3001 8: cpu_server03.csv B 3002 9: cpu_server03.csv C 3003
rbindlist()
创建了id列&#34; file.name&#34;并使用列表元素的名称填充它。
或者,我们可以先调用rbindlist()
并将文件名添加为因子:
library(data.table)
path <- "."
# get vector of filenames, note that pattern includes the cvs extension
filenames <- list.files(path, pattern = "cpu_.*csv$", full.names = TRUE)
# read files as a list of data.frames and combine immediately
combi <- rbindlist(lapply(filenames, read.csv, header = TRUE), idcol = "file.name")
# change file number to appropriately labeled factor
combi[, file.name := factor(file.name, labels = basename(filenames))][]
file.name V1 V2 1: cpu_server01.csv A 1001 2: cpu_server01.csv B 1002 3: cpu_server01.csv C 1003 4: cpu_server02.csv A 2001 5: cpu_server02.csv B 2002 6: cpu_server02.csv C 2003 7: cpu_server03.csv A 3001 8: cpu_server03.csv B 3002 9: cpu_server03.csv C 3003
为了再现性,虚拟文件由
创建idx_vec <- 1:3
invisible(sapply(1:3, function(i) {
x <- data.frame(V1 = LETTERS[idx_vec], V2 = 1000L * i + idx_vec)
write.csv(x, sprintf("cpu_server%02i.csv", i), row.names = FALSE)
}))