我要创建的scaled_assessment
数据有问题。
我有一些时间序列数据,我将其分为analysis
和assessment
。我想缩放analysis
数据,并使用这些缩放后的means
和sd
应用于assessment
数据。我在下面的代码中添加了注释。
我在解决mutate_at
函数时遇到问题。我想应用从mean
数据中提取sd
和analysis
的比例函数,并将其应用于assessment
数据。 -对于assessment
数据中的所有列。
数据/代码:
library(rsample)
set.seed(1131)
# I create some random data
ex_data <- data.frame(row = 1:20, some_cat_var = paste("cat"), some_var = rnorm(20), some_other_var = rnorm(20))
ex_data
# I create the analysis and assessment splits - the analysis data has 10 observations the assess has 1
rolled_ex_data <- rolling_origin(ex_data,
initial = 10,
assess = 1,
cumulative = FALSE,
skip = 0)
# My scaling function to apply to the analysis data
Scale_Me <- function(x){
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
# This I believe "works" I collect the mean and sd from the 3rd and 4th column of the data for each split
scale_values <- map(rolled_ex_data$splits, ~ analysis(.x) %>%
as_tibble(., .name_repair = "universal") %>%
summarise_at(.vars = 3:ncol(.), .funs = c(mean = "mean", sd = "sd")))
# I then apply the scale function to the analysis data (to columns 3 and 4) for each split
scaled_analysis <- map(rolled_ex_data$splits, ~ analysis(.x) %>%
as_tibble(., .name_repair = "universal") %>%
mutate_at(.vars = 3:ncol(.), .funs = c(Scale_Me = "scale")))
# My problem is here with the mutate_at function
scaled_assessment <- map2(rolled_ex_data$splits, scale_values, ~ assessment(.x) %>%
as_tibble(., .name_repair = "universal") %>%
mutate_at(.vars = 3:ncol(.), .funs = c(scaled_col = (.vars - .y$mean) / .y$sd)))
好的。我已经设法使用mutate
使它对两个变量起作用。
scaled_assessment <- map2(rolled_ex_data$splits, scale_values, ~ assessment(.x) %>%
#as_tibble(.x, .name_repair = "universal") %>%
mutate(
some_var_scaled = (some_var - .y$some_var_mean) / .y$some_var_sd,
some_other_var_scaled = (some_other_var - .y$some_other_var_mean) / .y$some_other_var_sd
)
)
这会给我10个清单:
scaled_assessment[[1]]
scaled_assessment[[2]]
scaled_assessment[[3]]
> scaled_assessment[[1]]
row some_cat_var some_var some_other_var some_var_scaled some_other_var_scaled
1 11 cat -1.350214 -0.569947 -1.603747 -0.2836588
> scaled_assessment[[1]]
row some_cat_var some_var some_other_var some_var_scaled some_other_var_scaled
1 11 cat -1.350214 -0.569947 -1.603747 -0.2836588
> scaled_assessment[[2]]
row some_cat_var some_var some_other_var some_var_scaled some_other_var_scaled
1 12 cat 2.242594 -1.195205 3.038992 -0.7670828
> scaled_assessment[[3]]
row some_cat_var some_var some_other_var some_var_scaled some_other_var_scaled
1 13 cat 1.781132 0.9764677 1.593273 1.194117
我想知道如何使用mutate_at
来执行此操作,因为我不知道必须缩放的时间序列列的数量。在这里,我使用2列some_var
和some_other_var
,但是我可以使用3列或4列,这就是为什么我尝试使用.vars = 3:ncol(.)
的原因。