我列出了701个csv
文件。每个列具有相同数量的列(7)但行数不同(在25000和28000之间)。
以下是第一个文件的摘录:
Date,Week,Week Day,Hour,Price,Volume,Sale/Purchase
18/03/2011,11,5,1,-3000.00,17416,Sell
18/03/2011,11,5,1,-1001.10,17427,Sell
18/03/2011,11,5,1,-1000.00,18055,Sell
18/03/2011,11,5,1,-500.10,18057,Sell
18/03/2011,11,5,1,-500.00,18064,Sell
18/03/2011,11,5,1,-400.10,18066,Sell
18/03/2011,11,5,1,-400.00,18066,Sell
18/03/2011,11,5,1,-300.10,18068,Sell
18/03/2011,11,5,1,-300.00,18118,Sell
现在我试图绘制一年内第九小时供应曲线的后续回归系数(在价格间隔-50和150中)。
首先我做了回归:
allenamen <- dir(pattern="*.csv")
alledat <- lapply(allenamen, read.csv, header = TRUE, sep = ",", stringsAsFactors = FALSE)
h <- list()
for(i in 1:length(alledat)){
g <- function(a, b, c, d, p) {a*atan(b*p+c)+d}
f <- nlsLM(Volume ~ g(a,b,c,d,Price), data=subset(alledat[[i]], (Hour==9) & (Sale.Purchase == "Sell") & (!Price %in% as.character(-50:150))), start = list(a=4000, b=0.1, c=-5, d=32000))
h[[i]] <- coef(f)
}
h.df <- setNames(do.call(rbind.data.frame, h), names(h[[1]]))
然后我只取了供应曲线和第九小时的数据并改变了日期的格式:
files <- list.files(pattern="*.csv")
df <- data.frame()
for(i in 1:length(files)) {
xx <- read.csv(as.character(files[i]))
xx <- subset(xx, Sale.Purchase == "Sell" & Hour == 9)
df <- rbind(df, xx)
}
df$Date <- as.Date(as.character(df$Date), format="%d/%m/%Y")
然后我尝试绘制系数a:
plot(h.df$a ~ Date, df, xlim = as.Date(c("2012-01-01", "2012-12-31")))
但是我收到了这个错误:
Error in (function (formula, data = NULL, subset = NULL, na.action = na.fail, :
variable lengths differ (found for 'Date')
感谢您的帮助!