Question

我试图每个月左右合并一个HOBOware数据记录器中的数据，所以我们收集数据并一直在excel中进行编译。我正在尝试获取原始CSV文件并在R中处理它们。我正在努力正确地循环格式化CSV

我可以每月通过

分别设置格式

Pool_xxx <- read_csv("Pool_xxx.csv", 
                           col_types = cols(`Date Time, GMT-05:00` = col_datetime(format = "%m/%d/%y %H:%M:%S")), 
                           skip = 1)[,2:4]

但是我想创建一个循环，以执行文件夹中的每个CSV

我已经阅读了很多有关如何循环的文章，但是我无法弄清楚列规格的放置位置

setwd("E:/R Hobo/Conversion test/Converted HOBO files")
mydir = "Pool 6"
myfiles = list.files(path=mydir, pattern="*.csv", full.names=TRUE)
numfiles <- length(myfiles)     
for (numfiles in myfiles) {
      sample <- read.csv(numfiles,
                          header = FALSE,
                          sep = ",",
                          col_types = cols(`Date Time, GMT-05:00` = col_datetime(format = "%m/%d/%y %H:%M:%S")),  
                          skip = 1) [,2:4]
}

我一直在找回来，我不确定去哪里

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  unused argument (col_types = cols(`Date Time, GMT-05:00` = col_datetime(format = "%m/%d/%y %H:%M:%S")))

有人建议lapply，但R一直说它与3.5.3版本不兼容

链接到原始CSV：https://drive.google.com/file/d/1SUf--PNznlNOlDkXeXYaRKuSHqa-EVZM/view?usp=sharing

Answer 1

您在第一段代码中使用readr::read_csv，但是在第二段代码中，您已切换到read.csv，它是base的一部分，并且具有不同的论点。只需在read_csv的{{1}}处插入即可解决此问题。

恐怕我仍然每天都在使用R3.4.4，所以我不知道这个棘手的问题。但是，每次循环访问时，编写的代码都将覆盖样本-在该块的末尾，sample <- read_csv(numfiles,将包含最近读取的csv中的值。这是一种替代方法（我会尽可能保留其他代码）：我们从第一个文件创建sample，然后在每次循环时，将其与从下一个文件读取的数据结合起来（请参见sample，了解有关dplyr函数的更多信息。

?union_all

我在这里做了一些其他小的修复和更改。您在library(dplyr) library(readr) setwd("E:/R Hobo/Conversion test/Converted HOBO files") mydir = "Pool 6" myfiles = list.files(path=mydir, pattern="\\.csv$", full.names=TRUE) # numfiles <- length(myfiles) # You want sample to exist before the loop, so you can union the new content with the existing object sample <- read_csv(myfiles[1], col_types = cols(`Date Time, GMT-05:00` = col_datetime(format = "%m/%d/%y %H:%M:%S")), skip = 1) %>% select(2:4) for (numfiles in myfiles[-1]) { # numfiles as referenced here, and in the block below, is independent of the assignment you make above; myfiles[-1] removes the first element used above. sample <- union_all(sample, read_csv(numfiles, col_types = cols(`Date Time, GMT-05:00` = col_datetime(format = "%m/%d/%y %H:%M:%S")), skip = 1) %>% select(2:4)) }中的模式有效，但是该参数需要一个正则表达式，因此应该更接近我的想法。我删除了list.files而不是header的{{1}}和sep参数。最后，我使用read.csv来代替read_csv来抓取列-dplyr::select返回一个[，并且select在其中发挥得更好。不幸的是，列名有点难以使用，因此我仍然按索引选择它们。

~~您没有提供我可以实际测试的示例，因为我没有要读取的文件，因此我无法测试，但是应该可以。~~ < / p>

在R循环中格式化多个CSV文件

1 个答案: