我有很多数据集的文件夹
C:/path/folder
文件夹包含子文件夹
/1
/2
/3
...
每个子文件夹都有1-20个csv文件。
因此,我需要将文件夹子文件夹中的所有csv合并到一个csv文件中, 但 每个观察结果必须在其子文件夹中有标记。
示例 如果我合并子文件夹1和子文件夹2中的csv文件,我会得到
newdata=structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "02.01.2018", class = "factor"),
Revenue = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 6.25, 5.92,
6.25, 5.92, 6.25, 5.92, 5.92, 5.92, 6.25, 6.25, 6.25, 5.92,
6.25, 6.25, 5.92, 5.92, 5.92, 6.25, 5.92)), .Names = c("Date",
"Revenue", "Budget"), class = "data.frame", row.names = c(NA,
-20L))
这是一个小错误,我需要为观察结果分配数字子文件夹。 所以输出
Date Revenue Budget subfolder
02.01.2018 0 6,25 1
02.01.2018 0 6,25 1
02.01.2018 0 5,92 1
02.01.2018 0 6,25 1
02.01.2018 0 5,92 1
02.01.2018 0 6,25 1
02.01.2018 0 5,92 1
02.01.2018 0 5,92 1
02.01.2018 0 5,92 1
02.01.2018 0 6,25 1
02.01.2018 0 6,25 1
02.01.2018 0 6,25 1
02.01.2018 0 5,92 2
02.01.2018 0 6,25 2
02.01.2018 0 6,25 2
02.01.2018 0 5,92 2
02.01.2018 0 5,92 2
02.01.2018 0 5,92 2
02.01.2018 0 6,25 2
02.01.2018 0 5,92 2
所以从1:12开始的obs被当作子文件夹1 和obs。从13:20取自子文件夹2
分开 子文件夹1
C:/path/folder/subfolder1
f1=structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "02.01.2018", class = "factor"), Revenue = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 6.25,
5.92, 6.25, 5.92, 6.25, 5.92, 5.92, 5.92, 6.25, 6.25)), .Names = c("Date",
"Revenue", "Budget"), class = "data.frame", row.names = c(NA,
-11L))
C:/path/folder/subfolder2
f2 =
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "02.01.2018", class = "factor"), Revenue = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Budget = c(6.25, 5.92, 6.25,
6.25, 5.92, 5.92, 5.92, 6.25, 5.92)), .Names = c("Date", "Revenue",
"Budget"), class = "data.frame", row.names = c(NA, -9L))
答案 0 :(得分:1)
假设您具有以下文件夹结构:
master
|
+-- folder1
|
+-- file1.csv
+-- file2.csv
+-- folder2
|
+-- file1.csv
+-- file2.csv
并且您的工作目录是“ master”,那么您可以执行以下操作:
# this filters out all non-files (directories) in master
dirs <- list.files()[!grepl("[.]", list.files())]
# This creates the dataframe that will be filled
all_data <- data.frame(Date = character(),
Revenue = integer(),
Budget = numeric(),
dirname = character())
# Loops over directories
for (dirname in dirs) {
# Get all csv files
all_csv <- list.files()[grepl(".csv", list.files())]
# Loops over files in the directory
for (file in all_csv) {
tempdata <- read.table(file, stringsAsFactors = FALSE, header = TRUE)
tempdata$dirname <- dirname
all_data <- rbind(all_data, tempdata)
}
}