我有一个实际观测数据集df1:
Date Month Year Actual
02/12/2017 December 17 4623
12/12/2017 December 17 5111
22/12/2017 December 17 4800
22/12/2017 December 17 4769
02/01/2018 January 18 4711
03/01/2018 January 18 4503
04/01/2018 January 18 4650
05/01/2018 January 18 4598
06/02/2018 February 18 4612
07/02/2018 February 18 4493
08/02/2018 February 18 4515
09/02/2018 February 18 4469
然后是我的每月预测df2:
Month Year Prediction
December 17 4874
January 18 4626
February 18 4576
如何根据每月和每年的实际价值减去预测?所以我会收到以下错误:
Error
-251
237
-74
-105
85
-123
24
-28
36
-83
-61
-107
答案 0 :(得分:1)
请注意,由于年份与2018年实际值不匹配,因此问题中的结果是错误的。
1)基本R 左联接数据框并执行减法:
transform(merge(act, pred, all.x = TRUE, sort = FALSE), Diff = Prediction - Actual)
给予:
Month Year Date Actual Prediction Diff
1 December 17 02/12/2017 4623 4874 251
2 December 17 12/12/2017 5111 4874 -237
3 December 17 22/12/2017 4800 4874 74
4 December 17 22/12/2017 4769 4874 105
5 January 18 02/01/2018 4711 NA NA
6 January 18 03/01/2018 4503 NA NA
7 January 18 04/01/2018 4650 NA NA
8 January 18 05/01/2018 4598 NA NA
9 February 18 06/02/2018 4612 NA NA
10 February 18 07/02/2018 4493 NA NA
11 February 18 08/02/2018 4515 NA NA
12 February 18 09/02/2018 4469 NA NA
sqldf
library(sqldf)
sqldf("select *, Prediction - Actual as Diff
from act left join pred using(Year, Month)")
给予:
Date Month Year Actual Prediction Diff
1 02/12/2017 December 17 4623 4874 251
2 12/12/2017 December 17 5111 4874 -237
3 22/12/2017 December 17 4800 4874 74
4 22/12/2017 December 17 4769 4874 105
5 02/01/2018 January 18 4711 NA NA
6 03/01/2018 January 18 4503 NA NA
7 04/01/2018 January 18 4650 NA NA
8 05/01/2018 January 18 4598 NA NA
9 06/02/2018 February 18 4612 NA NA
10 07/02/2018 February 18 4493 NA NA
11 08/02/2018 February 18 4515 NA NA
12 09/02/2018 February 18 4469 NA NA
可重复输入的形式是:
Lines1 <- "
Date Month Year Actual
02/12/2017 December 17 4623
12/12/2017 December 17 5111
22/12/2017 December 17 4800
22/12/2017 December 17 4769
02/01/2018 January 18 4711
03/01/2018 January 18 4503
04/01/2018 January 18 4650
05/01/2018 January 18 4598
06/02/2018 February 18 4612
07/02/2018 February 18 4493
08/02/2018 February 18 4515
09/02/2018 February 18 4469"
act <- read.table(text = Lines1, header = TRUE, as.is = TRUE)
Lines2 <- "
Month Year Prediction
December 17 4874
January 17 4626
February 17 4576"
pred <- read.table(text = Lines2, header = TRUE, as.is = TRUE)
答案 1 :(得分:0)
再次使用G Grothendieck的数据,包括对一月和二月的年份的评论,您可以使用dplyr进行此操作:
library(dplyr)
act %>%
full_join(pred) %>%
mutate(Error = Actual - Prediction) %>%
select(-Prediction)
Date Month Year Actual Error
1 02/12/2017 December 17 4623 -251
2 12/12/2017 December 17 5111 237
3 22/12/2017 December 17 4800 -74
4 22/12/2017 December 17 4769 -105
5 02/01/2018 January 18 4711 85
6 03/01/2018 January 18 4503 -123
7 04/01/2018 January 18 4650 24
8 05/01/2018 January 18 4598 -28
9 06/02/2018 February 18 4612 36
10 07/02/2018 February 18 4493 -83
11 08/02/2018 February 18 4515 -61
12 09/02/2018 February 18 4469 -107
数据:
Lines1 <- "
Date Month Year Actual
02/12/2017 December 17 4623
12/12/2017 December 17 5111
22/12/2017 December 17 4800
22/12/2017 December 17 4769
02/01/2018 January 18 4711
03/01/2018 January 18 4503
04/01/2018 January 18 4650
05/01/2018 January 18 4598
06/02/2018 February 18 4612
07/02/2018 February 18 4493
08/02/2018 February 18 4515
09/02/2018 February 18 4469"
act <- read.table(text = Lines1, header = TRUE, as.is = TRUE)
Lines2 <- "
Month Year Prediction
December 17 4874
January 18 4626
February 18 4576"
pred <- read.table(text = Lines2, header = TRUE, as.is = TRUE)