Question

我正在运行以下代码，以便打开一组具有温度与时间数据的CSV文件

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) 
{
  assign(temp[i], read.csv(temp[i], header=FALSE, skip =20))
  colnames(as.data.frame(temp[i])) <- c("Date","Unit","Temp")
}

数据框中的数据如下所示：

                   V1 V2   V3
1 6/30/13 10:00:01 AM  C 32.5
2 6/30/13 10:20:01 AM  C 32.5
3 6/30/13 10:40:01 AM  C 33.5
4 6/30/13 11:00:01 AM  C 34.5
5 6/30/13 11:20:01 AM  C 37.0
6 6/30/13 11:40:01 AM  C 35.5

我只是尝试分配列名但收到以下错误消息：

Error in `colnames<-`(`*tmp*`, value = c("Date", "Unit", "Temp")) : 
  'names' attribute [3] must be the same length as the vector [1]

我认为我的循环读取csv文件可能有些事情要做。它们都存储在R的同一目录中。

感谢您的帮助！

Answer 1

我采取了一种稍微不同的方法，这可能更容易理解：

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) 
{
  tmp <- read.csv(temp[i], header=FALSE, skip =20)
  colnames(tmp) <- c("Date","Unit","Temp")
  # Now what do you want to do?
  # For instance, use the file name as the name of a list element containing the data?
}

更新

temp = list.files(pattern="*.csv")
stations <- vector("list", length(temp))
for (i in 1:length(temp)) {
  tmp <- read.csv(temp[i], header=FALSE, skip =20)
  colnames(tmp) <- c("Date","Unit","Temp")
  stations[[i]] <- tmp
}
names(stations) <- temp # optional; could process file names too like using basename

station1 <- station[[1]] # etc  station1 would be a data.frame

第二部分也可以改进，具体取决于您计划如何使用数据，以及有多少数据。一个很好的命令就是str（某个对象）。它将真正帮助您了解R的数据结构。

更新＃2：

将单个数据框架放入您的工作区将非常困难 - 比我可能知道一些技巧更聪明的人。既然你想要绘制这些，我首先要做出你想要的名字：

names(stations) <- paste(basename(temp), 1:length(stations), sep = "_")

然后我会按如下方式迭代上面创建的列表，随时创建你的图：

for (i in 1:length(stations)) {
    tmp <- stations[[i]]
    # tmp is a data frame with columns Date, Unit, Temp
    # plot your data using the plot commands you like to use, for example
    p <- qplot(x = Date, y = Temp, data = tmp, geom = "smooth", main = names(stations)[i])
    print(p)
    # this is approx code, you'll have to play with it, and watch out for Dates
    # I recommend the package lubridate if you have any troubles parsing the dates
    # qplot is in package ggplot2
}

如果您想将它们保存在文件中，请使用：

pdf("filename.pdf")
# then the plotting loop just above
dev.off()

将创建多页pdf。祝你好运！

Answer 2

通常不建议在R中使用'assign'语句。（我应该找到一些资源来解释为什么会这样。）

你可以使用这样的函数做你正在尝试的事情：

read.a.file <- function (f, cnames, ...) {
  my.df <- read.csv(f, ...)
  colnames(my.df) <- cnames
  ## Here you can add more preprocessing of your files.
}

使用以下方法遍历文件列表：

lapply(X=temp, FUN=read.a.file, cnames=c("Date", "Unit", "Temp"), skip=20, header=FALSE)

Answer 3

“read.csv”返回一个data.frame，因此您不需要“as.data.frame”调用;
您可以使用“read.csv”的“col.names”参数来指定列名称;
我不知道您使用的是什么版本的R，但是“colnames（as.data.frame（...））＆lt; - ”只是一个不正确的调用，因为它要求“as.data.frame＆lt ; - “不存在的功能，至少在2.14版本中。

Answer 4

对你的困境的短期解决方法如下，但你真的需要阅读更多关于使用R的内容，因为你从上面做了我希望你会很快陷入另一个混乱。也许从不使用assign开始。

lapply(list.files(pattern = "*.csv"), function (f) {
  df = read.csv(f, header = F, skip = 20))
  names(df) = c('Date', 'Unit', 'Temp')
  df
}) -> your_list_of_data.frames

虽然您更希望这样（编辑以保存文件名信息）：

df = do.call(rbind,
             lapply(list.files(pattern = "*.csv"), function(f)
                    cbind(f, read.csv(f, header = F, skip = 20))))
names(df) = c('Filename', 'Date', 'Unit', 'Temp')

Answer 5

一眼就会看到你在临时列表的元素周围缺少一组子集括号[]。您的属性列表有三个元素，但因为您有temp[i]而不是temp[[i]]，for循环实际上并不访问列表的元素，因此将其视为长度为1的元素，如错误所示。 / p>

将列名称分配给R中的数据帧时出错

5 个答案: