我正在尝试将多个文件加载到R环境中,我已经尝试过类似以下内容;
files <- list.files(pattern = ".Rda", recursive = TRUE)
lapply(files,load,.GlobalEnv)
仅加载一个数据文件(不正确)。我发现的问题是,每年所有文件的名称都相同。例如"Year1/beer/beer.Rda"
也有"Year2/beer/beer.Rda"
。
我正在尝试在导入时重命名数据文件,因此beer1
和beer2
将对应于啤酒第1年和啤酒第2年,等等。
有人有更好的数据加载方法吗?我拥有超过2年的数据。
文件名:
[1] "Year1/beer/beer.Rda" "Year1/blades/blades.Rda" "Year1/carbbev/carbbev.Rda"
[4] "Year1/cigets/cigets.Rda" "Year1/coffee/coffee.Rda" "Year1/coldcer/coldcer.Rda"
[7] "Year1/deod/deod.Rda" "Year1/diapers/diapers.Rda" "Year1/factiss/factiss.Rda"
[10] "Year1/fzdinent/fzdinent.Rda" "Year1/fzpizza/fzpizza.Rda" "Year1/hhclean/hhclean.Rda"
[13] "Year1/hotdog/hotdog.Rda" "Year1/laundet/laundet.Rda" "Year1/margbutr/margbutr.Rda"
[16] "Year1/mayo/mayo.Rda" "Year1/milk/milk.Rda" "Year1/mustketc/mustketc.Rda"
[19] "Year1/paptowl/paptowl.Rda" "Year1/peanbutr/peanbutr.Rda" "Year1/photo/photo.Rda"
[22] "Year1/razors/razors.Rda" "Year1/saltsnck/saltsnck.Rda" "Year1/shamp/shamp.Rda"
[25] "Year1/soup/soup.Rda" "Year1/spagsauc/spagsauc.Rda" "Year1/sugarsub/sugarsub.Rda"
[28] "Year1/toitisu/toitisu.Rda" "Year1/toothbr/toothbr.Rda" "Year1/toothpa/toothpa.Rda"
[31] "Year1/yogurt/yogurt.Rda" "Year2/beer/beer.Rda" "Year2/blades/blades.Rda"
[34] "Year2/carbbev/carbbev.Rda" "Year2/cigets/cigets.Rda" "Year2/coffee/coffee.Rda"
[37] "Year2/coldcer/coldcer.Rda" "Year2/deod/deod.Rda" "Year2/diapers/diapers.Rda"
[40] "Year2/factiss/factiss.Rda" "Year2/fzdinent/fzdinent.Rda" "Year2/fzpizza/fzpizza.Rda"
[43] "Year2/hhclean/hhclean.Rda" "Year2/hotdog/hotdog.Rda" "Year2/laundet/laundet.Rda"
[46] "Year2/margbutr/margbutr.Rda" "Year2/mayo/mayo.Rda" "Year2/milk/milk.Rda"
[49] "Year2/mustketc/mustketc.Rda" "Year2/paptowl/paptowl.Rda" "Year2/peanbutr/peanbutr.Rda"
[52] "Year2/photo/photo.Rda" "Year2/razors/razors.Rda" "Year2/saltsnck/saltsnck.Rda"
[55] "Year2/shamp/shamp.Rda" "Year2/soup/soup.Rda" "Year2/spagsauc/spagsauc.Rda"
[58] "Year2/sugarsub/sugarsub.Rda" "Year2/toitisu/toitisu.Rda" "Year2/toothbr/toothbr.Rda"
[61] "Year2/toothpa/toothpa.Rda" "Year2/yogurt/yogurt.Rda"
答案 0 :(得分:2)
一种解决方案是解析文件名,并将其作为名称分配给数据帧列表中的元素。我们将使用一些样本数据,这些数据具有两年啤酒品牌的月销售量,并以CSV文件的形式保存到两个子目录year1
和year2
中。
我们将使用lapply()
将文件读入数据帧列表,然后使用names()
函数通过在文件名后附加year<x>.
来命名每个元素({ {1}})。
.csv
...以及输出。
fileList <- c("year1/beer.csv","year2/beer.csv")
data <- lapply(fileList,function(x){
read.csv(x)
})
# generate data set names to be assigned to elements in the list
fileNameTokens <- strsplit(fileList,"/|[.]")
theNames <- unlist(lapply(fileNameTokens,function(x){
paste0(x[1],".",x[2])
}))
names(data) <- theNames
# print first six rows of file 1 based on named extract
data[["year1.beer"]][1:6,]
接下来,我们将打印第二个文件的前几行。
> data[["year1.beer"]][1:6,]
Month Item Sales
1 1 Budweiser 83047
2 2 Budweiser 38374
3 3 Budweiser 47287
4 4 Budweiser 18417
5 5 Budweiser 23981
6 6 Budweiser 55471
>
如果需要直接访问文件而不依赖于> # print first six rows of file 1 based on named extract
> data[["year2.beer"]][1:6,]
Month Item Sales
1 1 Budweiser 23847
2 2 Budweiser 33847
3 3 Budweiser 44400
4 4 Budweiser 35333
5 5 Budweiser 18710
6 6 Budweiser 63108
>
的名称,可以通过list()
函数在lapply()
函数中将它们分配给父环境,如另一个答案。
assign()
...和输出。
# alternate form, assigning directly to parent environment
data <- lapply(fileList,function(x){
# x is the filename, parse into strings to generate data set name
fileNameTokens <- unlist(strsplit(x,"/|[.]"))
assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), read.csv(x),pos=1)
})
head(year1.beer)
该技术还可以与> head(year1.beer)
Month Item Sales
1 1 Budweiser 83047
2 2 Budweiser 38374
3 3 Budweiser 47287
4 4 Budweiser 18417
5 5 Budweiser 23981
6 6 Budweiser 55471
>
文件一起使用,如下所示。
RDS
...和输出。
data <- lapply(fileList,function(x){
# x is the filename, parse into strings to generate data set name
fileNameTokens <- unlist(strsplit(x,"/|[.]"))
assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), readRDS(x),pos=1)
})
head(year1.beer)
答案 1 :(得分:1)
一种选择是将文件加载到新环境中,然后将其分配给父环境中的自定义命名对象。
这是从https://stackoverflow.com/a/5577647/6561924修改的
# first create custom names for objects (e.g. add folder names)
file_names <- gsub("/", "_", files)
file_names <- gsub("\\.Rda", "", file_names)
# function to load objects in new environ
load_obj <- function(f, f_name) {
env <- new.env()
nm <- load(f, env)[1] # load into new environ and capture name
assign(f_name, env[[nm]], pos = 1) # pos 1 is parent env
}
# load all
mapply(load_obj, files, file_names)