我有一个数据集如下:
# Define Adstock Rate
adstock_rate = 0.50
# Create Data
advertising = c(117.913, 120.112, 125.828, 115.354, 177.090, 141.647, 137.892, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 158.511, 109.385, 91.084, 79.253, 102.706,
78.494, 135.114, 114.549, 87.337, 107.829, 125.020, 82.956, 60.813, 83.149, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 129.515, 105.486, 111.494, 107.099, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
134.913, 123.112, 178.828, 112.354, 100.090, 167.647, 177.892, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 112.511, 155.385, 123.084, 89.253, 67.706,
23.494, 122.114, 112.549, 65.337, 134.829, 123.020, 81.956, 23.813, 65.149, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 145.515, 154.486, 121.494, 117.099, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000
)
Region = c(500, 500, 500, 500, 500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,
500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500, 500, 500,
500, 500,
501, 501, 501, 501, 501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,
501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501, 501, 501,
501, 501)
advertising_dataset<-data.frame(cbind(Region, advertising))
这就是数据集的样子:
Region advertising
1 500 117.913
2 500 120.112
3 500 125.828
4 500 115.354
5 500 177.090
6 500 141.647
7 500 137.892
8 500 0.000
9 500 0.000
10 500 0.000
11 500 0.000
12 500 0.000
13 500 0.000
14 500 0.000
15 500 0.000
16 500 0.000
17 500 0.000
18 500 158.511
19 500 109.385
20 500 91.084
从这里开始,我将应用一个滞后函数,在该函数中我取第一个值,然后应用for循环来转换我的数据集。
# Alternative Method Using Loops Proposed by Linh Tran
advertising_dataset$adstocked_advertising = numeric(length(advertising_dataset$advertising))
advertising_dataset$adstocked_advertising[1] = advertising_dataset$advertising[1]
for(i in 2:length(advertising_dataset$advertising)){
advertising_dataset$adstocked_advertising[i] = advertising_dataset$advertising[i] + adstock_rate * advertising_dataset$adstocked_advertising[i-1]}
我遇到的问题是我的数据集是按地区分开的。我需要按区域应用上面的这个函数(包括取第一个值)。
有没有办法用dplyr包来做到这一点?
我知道这是错的,但也许是这样的:
library(dplyr)
separated_by_region<- advertising_dataset %>%
group_by(Region) %>%
summarise(
advertising_dataset$adstocked_advertising =
numeric(length(advertising_dataset$advertising))
advertising_dataset$adstocked_advertising[1] =
advertising_dataset$advertising[1]
for(i in 2:length(advertising_dataset$advertising)){
advertising_dataset$adstocked_advertising[i] =
advertising_dataset$advertising[i] + adstock_rate *
advertising_dataset$adstocked_advertising[i-1]})
这些方面的东西。不确定如何做到这一点。
我有一种感觉我可能不得不使用split(advertising_dataset,advertising_dataset $ Region)并使用apply函数并对结果进行rbind。
任何帮助都会很棒,谢谢!
示例输出(但函数需要按区域应用)最后1个最终数据集:
Region advertising adstocked_advertising
500 117.913 117.9130000
500 120.112 179.0685000
500 125.828 215.3622500
500 115.354 223.0351250
500 177.090 288.6075625
500 141.647 285.9507812
500 137.892 280.8673906
500 0.000 140.4336953
500 0.000 70.2168477
500 0.000 35.1084238
500 0.000 17.5542119
500 0.000 8.7771060
500 0.000 4.3885530
500 0.000 2.1942765
500 0.000 1.0971382
500 0.000 0.5485691
500 0.000 0.2742846
500 158.511 158.6481423
500 109.385 188.7090711
500 91.084 185.4385356
答案 0 :(得分:1)
我认为这不是你使用dplyr
的意思,或者这比do.call(rbind, lapply(...))
方法更好,但你可以像上面所描述的那样定义一个函数:< / p>
foo <- function(df_) {
df_$adstocked_advertising = df_$advertising
for (i in 2:nrow(df_)) {
df_$adstocked_advertising[i] = df_$advertising[i] + adstock_rate * df_$adstocked_advertising[i - 1]
}
return(df_)
}
然后,使用您的管道到group_by
区域将该功能应用于每个组:
library(dplyr)
adv_2 <- data.frame(advertising_dataset %>%
group_by(Region) %>%
do(foo(data.frame(.))))
> adv_2[1:10,]
Region advertising adstocked_advertising
1 500 117.913 117.91300
2 500 120.112 179.06850
3 500 125.828 215.36225
4 500 115.354 223.03512
5 500 177.090 288.60756
6 500 141.647 285.95078
7 500 137.892 280.86739
8 500 0.000 140.43370
9 500 0.000 70.21685
10 500 0.000 35.10842
> adv_2[50:60,]
Region advertising adstocked_advertising
50 500 0.000 0.401496
51 500 0.000 0.200748
52 500 0.000 0.100374
53 501 134.913 134.913000
54 501 123.112 190.568500
55 501 178.828 274.112250
56 501 112.354 249.410125
57 501 100.090 224.795063
58 501 167.647 280.044531
59 501 177.892 317.914266
60 501 0.000 158.957133
但是肯定需要一个数字检查,它看起来似乎与500组的输出相匹配。
修改强>
根据评论,滞后值可调的版本。
foo <- function(df_, lag_val = 1) {
df_$adstocked_advertising = df_$advertising
for (i in (1 + lag_val):nrow(df_)) {
df_$adstocked_advertising[i] = df_$advertising[i] + adstock_rate * df_$adstocked_advertising[i - lag_val]
}
return(df_)
}
默认延迟仍为1,但现在您可以更改lag_val
,如果您想跳过'adstocked'列的那么多行。
adv_2 <- data.frame(advertising_dataset %>%
group_by(Region) %>%
do(foo(data.frame(.), lag_val = 3)))
> adv_2
Region advertising adstocked_advertising
1 500 117.913 117.913000
2 500 120.112 120.112000
3 500 125.828 125.828000
4 500 115.354 174.310500
5 500 177.090 237.146000
6 500 141.647 204.561000
7 500 137.892 225.047250
8 500 0.000 118.573000
9 500 0.000 102.280500
10 500 0.000 112.523625
我认为能做你想做的事,但绝对值得肯定。希望它能帮助您解决其他相关问题,但我猜它需要进行一些修改才能更灵活。
干杯,
路加福音