我正在尝试在R中编制预测方法。除了滞后测量的功率值之外,预测变量是天气变量。数据的准确性是一分钟,并且可以获得相应的时间和日期。 要向预测变量列表添加功率滞后值,我的目标是考虑最后十分钟的值。如果我确定数据库是完美的并且全年所有会议记录的值都可用,我可以简单地移动Power列但是因为情况可能并非总是如此,我每次使用以下代码来检查是否所有相应的十分钟滞后值都可用并提取它们并存储在矩阵中。问题在于,该过程非常耗时并且需要很长时间来模拟。在这里,我给出了可重复的例子。 我试图将as.POSIXlt更改为as.POSIXct,模拟时间变得更好但不是那么多。我还尝试将data.frame更改为矩阵(使用as。矩阵),但它将POSIXct值转换为“character”类。 我想知道你们中的任何人都可以建议一个更好的方法。谢谢。
rm(list = ls())
cat("\014")
st="2012/01/01"
et="2012/02/27"
st <- as.POSIXlt(as.Date(st))
et <- as.POSIXlt(as.Date(et))
time= seq(from=st, to=et,by=60)
time<as.POSIXlt(time)
#Window is the number of lag values
#leadTime is look-ahead time (forecast horizon)
leadTime=10;
Window=15;
zzzz=time[1:8000]
Total_Zone1=abind(matrix(rnorm(4000*2),4000*2,1), matrix(rnorm(4000*2),4000*2,1), matrix(rnorm(4000*2),4000*2,1),time[1:8000])
N_Train=nrow(Total_Zone1);
lag_Power=matrix(0,N_Train,Window)
colnames(Total_Zone1) <- c( "airtemp","humidity", "Power", "time")
Total_Zone1<- as.data.frame(Total_Zone1)
for (tt in 4000:N_Train){
Statlag=Total_Zone1$time[tt]-(leadTime+Window)*60
EndLag=Total_Zone1$time[tt]-(leadTime)*60
Index_lags=which((Total_Zone1$time>Statlag)&(Total_Zone1$time<=EndLag))
if (size(Index_lags)[2]<Window) {
Statlag2=Total_Zone1$time[tt]-24*60*60
Index_lags2=which(Total_Zone1$time==Statlag2)
tem1=rep(Total_Zone1[Index_lags2,c("Power")],Window-size(Index_lags)[2])
lag_Power[tt,]=t(c(Total_Zone1[Index_lags,c("Power")],tem1))
}else{
lag_Power[tt,]=t(Total_Zone1[Index_lags,c("Power")])
}
}
答案 0 :(得分:1)
我建议的第一件事就是将填写缺失数据的任务与重新格式化数据的任务分开。您已经在for
循环中合并了这两项任务,但我认为通过执行此分离可以简化和加速代码。
为了演示,我将使用我自己的简化示例代码和数据,这与您的类似。以下演示使用了一小时的时间点,从2012-01-01 09:00到2012-01-01 09:59。我已经为前20分钟生成了完整的数据,并为剩余的40分钟模拟了缺失数据:
timeFirst20Min <- seq(as.POSIXct('2012-01-01 09:00'),as.POSIXct('2012-01-01 09:19'),60);
timeNextScattered10 <- sort(sample(seq(as.POSIXct('2012-01-01 09:20'),as.POSIXct('2012-01-01 09:59'),60),10));
timeRaw <- c(timeFirst20Min,timeNextScattered10);
raw <- data.frame(airtemp=rnorm(30), humidity=rnorm(30), power=rnorm(30), time=timeRaw );
timeFull <- seq(as.POSIXct('2012-01-01 09:00'),as.POSIXct('2012-01-01 09:59'),60);
clean <- merge(raw, data.frame(time=timeFull), all=T );
clean;
这导致以下输出:
time airtemp humidity power
1 2012-01-01 09:00:00 -1.84406568 -1.51751811 -0.09391613
2 2012-01-01 09:01:00 -0.58830616 -0.02222524 -0.07865711
3 2012-01-01 09:02:00 1.63189916 -1.35274437 -1.32762005
4 2012-01-01 09:03:00 -1.96424678 0.07326323 -0.29004060
5 2012-01-01 09:04:00 -0.60856223 1.15876062 0.29804843
6 2012-01-01 09:05:00 -0.11504175 -0.43092259 0.49660688
7 2012-01-01 09:06:00 0.24671752 0.58974028 -1.08949827
8 2012-01-01 09:07:00 0.88375964 0.35821875 -0.94881659
9 2012-01-01 09:08:00 -0.70284907 -2.39016066 0.03804497
10 2012-01-01 09:09:00 1.34738493 0.64676077 0.77714355
11 2012-01-01 09:10:00 0.32271920 1.83127669 -1.44754595
12 2012-01-01 09:11:00 -0.46106830 -1.82204149 1.28886422
13 2012-01-01 09:12:00 2.27484699 -1.29816051 0.67963614
14 2012-01-01 09:13:00 -0.25799515 -0.23864259 0.33414247
15 2012-01-01 09:14:00 0.27535436 1.15491712 1.25857697
16 2012-01-01 09:15:00 0.09447671 0.72562227 0.35426141
17 2012-01-01 09:16:00 2.46141564 0.10448464 -0.04715810
18 2012-01-01 09:17:00 0.32952643 -0.51843045 -0.44380916
19 2012-01-01 09:18:00 0.34571678 1.74200137 2.05858658
20 2012-01-01 09:19:00 0.62863598 -0.38642998 1.29170691
21 2012-01-01 09:20:00 NA NA NA
22 2012-01-01 09:21:00 NA NA NA
23 2012-01-01 09:22:00 0.83148733 -0.79160256 -0.98798400
24 2012-01-01 09:23:00 NA NA NA
25 2012-01-01 09:24:00 NA NA NA
26 2012-01-01 09:25:00 NA NA NA
27 2012-01-01 09:26:00 -0.80548958 0.99318673 0.49630420
28 2012-01-01 09:27:00 NA NA NA
29 2012-01-01 09:28:00 -1.12821534 -0.64763280 0.28033445
30 2012-01-01 09:29:00 NA NA NA
31 2012-01-01 09:30:00 1.77407448 0.21064525 0.97942804
32 2012-01-01 09:31:00 NA NA NA
33 2012-01-01 09:32:00 -0.71832822 0.89608884 -0.14460970
34 2012-01-01 09:33:00 NA NA NA
35 2012-01-01 09:34:00 NA NA NA
36 2012-01-01 09:35:00 NA NA NA
37 2012-01-01 09:36:00 NA NA NA
38 2012-01-01 09:37:00 NA NA NA
39 2012-01-01 09:38:00 NA NA NA
40 2012-01-01 09:39:00 -0.17353680 1.69963228 0.10863195
41 2012-01-01 09:40:00 1.03484529 1.69734966 3.00898820
42 2012-01-01 09:41:00 NA NA NA
43 2012-01-01 09:42:00 NA NA NA
44 2012-01-01 09:43:00 -1.13983790 1.51982921 0.58583242
45 2012-01-01 09:44:00 NA NA NA
46 2012-01-01 09:45:00 NA NA NA
47 2012-01-01 09:46:00 NA NA NA
48 2012-01-01 09:47:00 NA NA NA
49 2012-01-01 09:48:00 NA NA NA
50 2012-01-01 09:49:00 NA NA NA
51 2012-01-01 09:50:00 NA NA NA
52 2012-01-01 09:51:00 NA NA NA
53 2012-01-01 09:52:00 -0.93153818 0.77736403 2.08962697
54 2012-01-01 09:53:00 NA NA NA
55 2012-01-01 09:54:00 2.26031675 -0.49847630 -1.48191078
56 2012-01-01 09:55:00 NA NA NA
57 2012-01-01 09:56:00 NA NA NA
58 2012-01-01 09:57:00 NA NA NA
59 2012-01-01 09:58:00 NA NA NA
60 2012-01-01 09:59:00 NA NA NA
因此,第一项任务是在clean
上插入缺失的功率值。现在,在您的代码中,您似乎正在向后看一个完整的24小时时段来获取在基准时间点之前的那个时间存在的功率值(如果它存在;它不能保证存在,是吗?),并将其复制到矢量的末尾,表示在窗口期间采集的功率测量值。这是我所指的代码:
Statlag2=Total_Zone1$time[tt]-24*60*60
Index_lags2=which(Total_Zone1$time==Statlag2)
tem1=rep(Total_Zone1[Index_lags2,c("Power")],Window-size(Index_lags)[2])
lag_Power[tt,]=t(c(Total_Zone1[Index_lags,c("Power")],tem1))
我不确定为什么你会在24小时后退,特别是考虑到窗口只有15分钟,并且只在基准时间点之前10分钟结束。我也不确定为什么你已经将插值值(足以复制以弥补可用功率值的不足)添加到{{1}上存储的行的
在任何情况下,我要做的是及时填写最近可用值的缺失功率值。我找到了一个很好的最近插值函数,可以直接在Replacing NAs in R with nearest value的单个向量上工作。在这里,我自己重新格式化:
lag_Power
您可以使用此功能轻松地从最近的可用值插入所有缺失的功率值:
interpolateMissing <- function(dat) {
N <- length(dat);
missing <- which(is.na(dat));
if (length(missing) %in% c(0,N)) return(dat);
notMissing <- which(!is.na(dat));
intervals <- findInterval(missing, notMissing, all.inside=T );
leftPos <- notMissing[pmax(1,intervals)];
rightPos <- notMissing[pmin(N,intervals+1)];
leftDist <- missing - leftPos;
rightDist <- rightPos - missing;
dat[missing] <- ifelse(leftDist <= rightDist, dat[leftPos], dat[rightPos] );
return(dat);
};
clean$power <- interpolateMissing(clean$power);
现在看起来像这样:
clean
所以,现在可以使用了,无需担心缺少值。
最后一点是,我不确定构建 time airtemp humidity power
1 2012-01-01 09:00:00 -1.84406568 -1.51751811 -0.09391613
2 2012-01-01 09:01:00 -0.58830616 -0.02222524 -0.07865711
3 2012-01-01 09:02:00 1.63189916 -1.35274437 -1.32762005
4 2012-01-01 09:03:00 -1.96424678 0.07326323 -0.29004060
5 2012-01-01 09:04:00 -0.60856223 1.15876062 0.29804843
6 2012-01-01 09:05:00 -0.11504175 -0.43092259 0.49660688
7 2012-01-01 09:06:00 0.24671752 0.58974028 -1.08949827
8 2012-01-01 09:07:00 0.88375964 0.35821875 -0.94881659
9 2012-01-01 09:08:00 -0.70284907 -2.39016066 0.03804497
10 2012-01-01 09:09:00 1.34738493 0.64676077 0.77714355
11 2012-01-01 09:10:00 0.32271920 1.83127669 -1.44754595
12 2012-01-01 09:11:00 -0.46106830 -1.82204149 1.28886422
13 2012-01-01 09:12:00 2.27484699 -1.29816051 0.67963614
14 2012-01-01 09:13:00 -0.25799515 -0.23864259 0.33414247
15 2012-01-01 09:14:00 0.27535436 1.15491712 1.25857697
16 2012-01-01 09:15:00 0.09447671 0.72562227 0.35426141
17 2012-01-01 09:16:00 2.46141564 0.10448464 -0.04715810
18 2012-01-01 09:17:00 0.32952643 -0.51843045 -0.44380916
19 2012-01-01 09:18:00 0.34571678 1.74200137 2.05858658
20 2012-01-01 09:19:00 0.62863598 -0.38642998 1.29170691
21 2012-01-01 09:20:00 NA NA 1.29170691
22 2012-01-01 09:21:00 NA NA -0.98798400
23 2012-01-01 09:22:00 0.83148733 -0.79160256 -0.98798400
24 2012-01-01 09:23:00 NA NA -0.98798400
25 2012-01-01 09:24:00 NA NA -0.98798400
26 2012-01-01 09:25:00 NA NA 0.49630420
27 2012-01-01 09:26:00 -0.80548958 0.99318673 0.49630420
28 2012-01-01 09:27:00 NA NA 0.49630420
29 2012-01-01 09:28:00 -1.12821534 -0.64763280 0.28033445
30 2012-01-01 09:29:00 NA NA 0.28033445
31 2012-01-01 09:30:00 1.77407448 0.21064525 0.97942804
32 2012-01-01 09:31:00 NA NA 0.97942804
33 2012-01-01 09:32:00 -0.71832822 0.89608884 -0.14460970
34 2012-01-01 09:33:00 NA NA -0.14460970
35 2012-01-01 09:34:00 NA NA -0.14460970
36 2012-01-01 09:35:00 NA NA -0.14460970
37 2012-01-01 09:36:00 NA NA 0.10863195
38 2012-01-01 09:37:00 NA NA 0.10863195
39 2012-01-01 09:38:00 NA NA 0.10863195
40 2012-01-01 09:39:00 -0.17353680 1.69963228 0.10863195
41 2012-01-01 09:40:00 1.03484529 1.69734966 3.00898820
42 2012-01-01 09:41:00 NA NA 3.00898820
43 2012-01-01 09:42:00 NA NA 0.58583242
44 2012-01-01 09:43:00 -1.13983790 1.51982921 0.58583242
45 2012-01-01 09:44:00 NA NA 0.58583242
46 2012-01-01 09:45:00 NA NA 0.58583242
47 2012-01-01 09:46:00 NA NA 0.58583242
48 2012-01-01 09:47:00 NA NA 0.58583242
49 2012-01-01 09:48:00 NA NA 2.08962697
50 2012-01-01 09:49:00 NA NA 2.08962697
51 2012-01-01 09:50:00 NA NA 2.08962697
52 2012-01-01 09:51:00 NA NA 2.08962697
53 2012-01-01 09:52:00 -0.93153818 0.77736403 2.08962697
54 2012-01-01 09:53:00 NA NA 2.08962697
55 2012-01-01 09:54:00 2.26031675 -0.49847630 -1.48191078
56 2012-01-01 09:55:00 NA NA -1.48191078
57 2012-01-01 09:56:00 NA NA -1.48191078
58 2012-01-01 09:57:00 NA NA -1.48191078
59 2012-01-01 09:58:00 NA NA -1.48191078
60 2012-01-01 09:59:00 NA NA -1.48191078
矩阵的好处是什么。每个窗口都与其他窗口重叠,因此该矩阵将包含重要的数据重复。相反,我会查看该预备代码下游如何使用该数据,并查看是否可以将其更改为直接在lag_Power
(或您的代码{{}中访问特定于窗口的值1}})。这可以节省处理时间和内存影响。
答案 1 :(得分:0)
rolling merge
包中的data.table
应该可以正常运行:
#Bring in data.table package
library(data.table)
setDT(Total_Zone1) #Convert data to data.table
leadTime=10
Window=15
Total_Zone1[, time2 := time-60*(leadTime)]
#Create lag table
Lag_Table <- Total_Zone1[,.(time,Lag.Power=Power)]
#Set keys and perform rolling merge
setkey(Total_Zone1,time2)
setkey(Lag_Table,time)
#Perform rolling merge in data.table
Result <- Lag_Table[Total_Zone1,roll=+(Window*60)]
结果:
Result[1:15]
time Lag.Power airtemp humidity Power i.time
1: 1325375400 NA 0.5341823 0.6114278 0.01378526 1325376000
2: 1325375460 NA -0.4063909 1.0928144 -0.63712941 1325376060
3: 1325375520 NA -1.3841581 -0.6204582 0.80553870 1325376120
4: 1325375580 NA 1.5146208 -0.4343324 -1.29370219 1325376180
5: 1325375640 NA -1.4949108 0.1416236 -0.64836292 1325376240
6: 1325375700 NA -1.7832364 -1.2018948 -1.13257989 1325376300
7: 1325375760 NA -1.2111490 0.5819049 0.22450291 1325376360
8: 1325375820 NA -0.2837945 0.3196630 0.43199908 1325376420
9: 1325375880 NA -1.2773624 0.3724754 -1.31807585 1325376480
10: 1325375940 NA 0.7710507 -1.0114118 -0.85951358 1325376540
11: 1325376000 0.01378526 0.7983597 1.7683065 0.38032568 1325376600
12: 1325376060 -0.63712941 0.3813563 -0.3905894 1.47871752 1325376660
13: 1325376120 0.80553870 0.7421624 -1.2265799 0.34509387 1325376720
14: 1325376180 -1.29370219 0.7768929 -1.2244243 -0.02769379 1325376780
15: 1325376240 -0.64836292 -0.8311352 0.8486815 0.16926006 1325376840