请考虑以下示例数据集。
简单来说, 自2000年以来,传感器出现故障,因此测量的值不正确,我们拥有10年的数据,包括测量值和实际值。
P.S。尽管我们没有按月提供应用程序和传感器类型的每种组合的数据。
现在,我们希望从算法中获得实际值。
我们尝试通过创建另一个名为diff = measured-actual的列来尝试XGBoost和CatBoost 并送入算法以识别模式。但不确定神经网络或时间序列(ARIMA)是否可行,但不确定哪种算法合适,但不确定 因为我们每月只有10年的数据
library(tidyverse)
train_data <- data.frame(
time = c(rep("01.2000",10),rep("02.2000",10),rep(".",3),rep("11.2010",10),rep("12.2010",10)),
application = c(rep("factory",4),rep("residential",3),rep("research",3),
rep("factory",2),rep("residential",5),rep("research",3),
rep(".",3),
rep("factory",2),rep("residential",2),rep("research",6),
rep("factory",7),rep("residential",1),rep("research",2)),
sensor = c(LETTERS[1:10],LETTERS[10:1],rep(".",3),LETTERS[c(5:1,10:6)],LETTERS[c(3:9,2,1,10)]),
measured = c(26.4,2000,1001,23.9,100000,0,1234,12098,34567,0,
123,676,12,0,100,0,0,98,1,190,
rep(".",3),
3454,0,101,9,1,0,14,1298,677,0,
264,20220,1851,3.9,1044,0,1764,0,34,0),
actual = c(26.4,2010,1001,23.9,100100,237,1234,12098,34567,19583,
123,706,1112,156,100,650,109,98,10,190,
rep(".",3),
3454,10,101,19,10,40,44,1298,760,50,
264,20220,1851,39,1048,870,1765,40,35,1110)
)
# to forecast actual
test_data <- data.frame(
time = rep("01.2011",10),
application = c(rep("factory",7),rep("residential",1),rep("research",2)),
sensor = LETTERS[c(1,4,5,9,3,2,8,6,7,10)],
measured = c(26.4,100000,0,0,
123,12,
3454,0,20220,1851)
)
How can we predict/forecast the actual values for 01.2011 data (test_data) ?