Question

我列出了701个csv文件。每个列具有相同数量的列（7）但行数不同（在25000和28000之间）。

以下是第一个文件的摘录：

Date,Week,Week Day,Hour,Price,Volume,Sale/Purchase
18/03/2011,11,5,1,-3000.00,17416,Sell
18/03/2011,11,5,1,-1001.10,17427,Sell
18/03/2011,11,5,1,-1000.00,18055,Sell
18/03/2011,11,5,1,-500.10,18057,Sell
18/03/2011,11,5,1,-500.00,18064,Sell
18/03/2011,11,5,1,-400.10,18066,Sell
18/03/2011,11,5,1,-400.00,18066,Sell
18/03/2011,11,5,1,-300.10,18068,Sell
18/03/2011,11,5,1,-300.00,18118,Sell

现在我试图绘制一年内第九小时供应曲线的后续回归系数（在价格间隔-50和150中）。

首先我做了回归：

allenamen <- dir(pattern="*.csv")
alledat <- lapply(allenamen, read.csv, header = TRUE, sep = ",", stringsAsFactors = FALSE)
h <- list()
for(i in 1:length(alledat)){
g <- function(a, b, c, d, p) {a*atan(b*p+c)+d}
f <- nlsLM(Volume ~ g(a,b,c,d,Price), data=subset(alledat[[i]], (Hour==9) & (Sale.Purchase == "Sell") & (!Price %in% as.character(-50:150))), start = list(a=4000, b=0.1, c=-5, d=32000))
h[[i]] <- coef(f)  
}
h.df <- setNames(do.call(rbind.data.frame, h), names(h[[1]]))

然后我只取了供应曲线和第九小时的数据并改变了日期的格式：

files <- list.files(pattern="*.csv")    
df <- data.frame()
for(i in 1:length(files)) {
xx <- read.csv(as.character(files[i]))    
xx <- subset(xx, Sale.Purchase == "Sell" & Hour == 9)
df <- rbind(df, xx)
}
df$Date <- as.Date(as.character(df$Date), format="%d/%m/%Y")

然后我尝试绘制系数a：

plot(h.df$a ~ Date, df, xlim = as.Date(c("2012-01-01", "2012-12-31")))

但是我收到了这个错误：

Error in (function (formula, data = NULL, subset = NULL, na.action = na.fail,  : 
variable lengths differ (found for 'Date')

感谢您的帮助！

绘制系数为一年

0 个答案: