Question

我大约有400个.csv文件，并且每个文件只需取一个值（如果使用电子表格软件打开，则为B2单元格）。

每个文件都是一个日期的摘录，并据此命名（即extract_2017-11-01.csv，extract_2018-04-05等）

我知道我可以做类似的事情来遍历文件（如果我写错了，或者如果有更好的方法请告诉我，请纠正我）：

path <- "~/csv_files"

out.file <- ""

file.names <- dir(path, pattern =".csv")

for(i in 1:length(file.names)){
  file <- read.table(file.names[i], header = TRUE, sep = ",")
  out.file <- rbind(out.file, file)
}

我想在此末尾有效地添加一些内容，以创建一个由两列组成的数据框：第一列将显示日期（理想情况下将从文件名中获取日期），第二列将保留其中的值单元格B2。

我该怎么做？

Answer 1

这使您在导入时仅选择第二行和第二列：

extract_2018_11_26 <- read.table("csv_files/extract_2018-11-26.csv", 
                                 sep=";", header = T, nrows=1, colClasses = c("NULL", NA, "NULL"))

因为nrows=1意味着我们仅读取第一行（header除外），并且在colClasses中，如果要跳过一列，则将"NULL"设为安全，如果要保留，则将NA设为安全。

在遵循代码之后，gsub()使您可以找到模式并将其替换为字符串：

out.file <- data.frame()
for(i in 1:length(file.names)){
  file <- read.table(file.names[i], 
                     sep=";", header = T, nrows=1, colClasses = c("NULL", NA,"NULL"))

  date <- gsub("csv_files/extract_|.csv", "",x=file.names[i]) # extracts the date from the file name
  out.file <- rbind(out.file, data.frame(date, col=file[, 1]))
}

out.file
#         date col
# 1 2018-11-26   2
# 2 2018-11-27   2

这是两个.csv原始文件：

#first file, name: extract_2018-11-26.csv
  col1 col2 col3
1    1    2    3
2    4    5    6
#second file, name: extract_2018-11-27.csv
  col1 col2 col3
1    1    2    3
2    4    5    6

Answer 2

data.table方法

ChunkListener.beforeChunk
Reading page 0
Reading item1
Reading item2
Writing item1
Writing item2
ChunkListener.afterChunk
ChunkListener.beforeChunk
Reading item3
Reading item4
Writing item3
Writing item4
ChunkListener.afterChunk
ChunkListener.beforeChunk
Reading page 1
Reading item5
Reading item6
Writing item5
Writing item6
ChunkListener.afterChunk
ChunkListener.beforeChunk
Reading item7
Reading item8
Writing item7
Writing item8
ChunkListener.afterChunk
ChunkListener.beforeChunk
Reading page 2
Reading item = null
ChunkListener.afterChunk

从文件名中提取数据是一个不同的问题，正则表达式相关..请在其他问题中提问...

如何从数百个.csv文件的特定单元格中提取值？

2 个答案: