在一个窗口中绘制许多csv文件

时间:2014-05-29 10:01:44

标签: r csv plot time-series

我列出了701个csv文件。每个列具有相同数量的列(7)但行数不同(在25000和28000之间)。

以下是第一个文件的摘录:

Date,Week,Week Day,Hour,Price,Volume,Sale/Purchase
18/03/2011,11,5,1,-3000.00,17416,Sell
18/03/2011,11,5,1,-1001.10,17427,Sell
18/03/2011,11,5,1,-1000.00,18055,Sell
18/03/2011,11,5,1,-500.10,18057,Sell
18/03/2011,11,5,1,-500.00,18064,Sell
18/03/2011,11,5,1,-400.10,18066,Sell
18/03/2011,11,5,1,-400.00,18066,Sell
18/03/2011,11,5,1,-300.10,18068,Sell
18/03/2011,11,5,1,-300.00,18118,Sell

现在我试图在Volume正好Date的情况下绘制Price200.00。然后我试图找到一个窗口,在那里我可以看到音量随时间的变化。

allenamen <- dir(pattern="*.csv")
alledat <- lapply(allenamen, read.csv, header = TRUE, 
   sep = ",", stringsAsFactors = FALSE)
verlauf <- function(a) {plot(Volume ~ Date, a, 
  data=subset(a, (Price=="200.00")), 
  ylim = c(15000, 45000), 
  xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")}
lapply(alledat, verlauf)

但是我收到了这个错误:

error in strsplit(log, NULL): non-character argument

如何避免错误?

3 个答案:

答案 0 :(得分:2)

以下是一些建议。

  1. 使用list.files而非dir来查找文件。 dir用于列出目录中的文件。您使用它的方式是当前目录。

  2. header = TRUEsep = ","read.csv的默认参数,因此代码中没有必要。

  3. 读取每个文件的子集

  4. 这是建议的方法。

    > fnames <- list.files(pattern  = "*.csv")
    > read <- lapply(fnames, function(x){
        rd <- read.csv(x, stringsAsFactors = FALSE)
        subset(rd, Price == 200)
        })
    > dat <- do.call(rbind, read)
    

    然后您应该能够绘制dat

答案 1 :(得分:2)

如果要将Price==200的所有子集合并到一个图中,可以使用以下函数:

plotprice <- function(x) {
  files <- list.files(pattern="*.csv")
  df <- data.frame()
  for(i in 1:length(files)){
    xx <- read.csv(as.character(files[i]))
    xx <- subset(xx, Price==x)
    df <- rbind(df, xx)
  }
  df$Date <- as.Date(as.character(df$Date), format="%d/%m/%Y")
  plot(Volume ~ Date, df, ylim = c(15000, 45000), xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")
}

使用plotprice(200),您可以在Price==200的一个图表中找到所有内容。


如果需要每个csv文件的图表,可以使用:

ploteach <- function(x) {
  files <- list.files(pattern="*.csv")
  for(i in 1:length(files)){
    df <- read.csv(as.character(files[i]))
    df <- subset(df, Price==x)
    df$Date <- as.Date(as.character(df$Date), format="%d/%m/%Y")
    plot(Volume ~ Date, df, ylim = c(15000, 45000), xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")
  }
}

ploteach(200)

答案 2 :(得分:0)

好的,首先你需要将lapply的结果 - read.csv从701 csv列表转换为单个数据帧。

增加了读取和子集的功能,以避免耗尽RAM:

#
# function to read and subset data to avoid running out of RAM
read.subset <- function(dateiname){
   a <- read.csv(file = dateiname, header = TRUE, sep = ",",
                 stringsAsFactors = FALSE)
   a <- a[a$Price == 200.00,]
   print(gc())    # monitor and clean RAM after each file is read
   return(a)
}

* 更新2:使用扫描

添加了更快的read.subset实现
# function to read and subset data to avoid running out of RAM
read.subset.fast <- function(dateiname){
   # get data from csv into a data.frame
   a <- scan(file          = dateiname,
             what          = c(list(character()),
                               rep(list(numeric()),5),
                               list(character())),
             skip          = 1,  # skip header (equivalent to header = TRUE)
             sep           = ",")
   # transform efficiently list into data.frame
   attributes(a) <- list(class      = "data.frame",
                         row.names  = c(NA_integer_, length(a[[1]])),
                         names      = scan(file          = dateiname,
                                           what          = character(),
                                           skip          = 0,  
                                           nlines        = 1,  # just read first line to extract column names
                                           sep           = ","))
   # subset data
   a <- a[a$Price == 200.00,]
   print(gc())
   return(a)
}
#

现在让我们在一个数据框中读取,子集和组合数据:

#
allenamen <- list.files(pattern="*.csv") # updated (@Richard Scriven)
# get a single data frame, instead of a list of 701 data frames
alledat <- do.call(rbind, lapply(allenamen, read.subset.fast))
#

将日期转换为正确的格式:

# get dates in dates format
alledat$Date <- as.Date(as.character(alledat$Date), format="%d/%m/%Y")

然后你很高兴,不需要任何功能。只是绘制它:

plot(Volume ~ Date, 
     data = alledat,
     ylim = range(Volume),
     xlim = range(Date),
     type = "l")