我有一个数据框,其中有2列:日期和返回。现在,我要更改多个新列,这些列取决于两个参数:阈值参数和滞后参数。功能很简单。新列的计算方式如下:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 100, 1024) 5064704
_________________________________________________________________
lstm_1 (LSTM) (None, 4946) 118130264
_________________________________________________________________
dropout_1 (Dropout) (None, 4946) 0
_________________________________________________________________
dense_1 (Dense) (None, 4946) 24467862
=================================================================
Total params: 147,662,830
Trainable params: 147,662,830
Non-trainable params: 0
_________________________________________________________________
如果ValueError: Error when checking target: expected dense_1 to have shape (1,) but got array with shape (4945,)
高于阈值,则给我var= ifelse(lag(return, n= lag_day)>threshold,return, NA))
值,否则给我lag(return)
。
以下是阈值和lag_days的值:
return
在这里,我正在手动解决问题:
NA
但是有没有一种解决方案可以使它更容易?也许有一个或两个套用功能?
这是我的示例数据框:
threshold=c(2,4,6)
lag_day=c(1,2,3)
答案 0 :(得分:3)
一种选择是获取“阈值”,“ lag_day”与crossing
的所有组合,然后遍历行(pmap
),transmute
以创建列的感兴趣并与原始数据集绑定。它使用base R
(seq_along
)中的一个功能
library(tidyverse)
crossing(threshold = seq_along(threshold), lag_day) %>%
pmap_dfc(~
df %>%
transmute(!! str_c("var_t", ..1, "_lag", ..2) :=
case_when(lag(return, n = ..2) > threshold[..1] ~ return,
TRUE ~ NA_real_))) %>%
bind_cols(df, .)
答案 1 :(得分:2)
使用两个dplyr::lag
的套用循环的基本R方法
df[paste0("var_t", outer(seq_along(lag_day), seq_along(threshold),
FUN = paste, sep = "_"))] <- do.call(cbind,
lapply(lag_day, function(x) sapply(threshold, function(y)
ifelse(dplyr::lag(df$return, n = x) > y, df$return, NA))))
# date return var_t1_1 var_t2_1 var_t3_1 var_t1_2 var_t2_2 var_t3_2 var_t1_3 var_t2_3 var_t3_3
# <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2019-05-21 1 NA NA NA NA NA NA NA NA NA
# 2 2019-05-22 2.5 NA NA NA NA NA NA NA NA NA
# 3 2019-05-23 2 2 NA NA NA NA NA NA NA NA
# 4 2019-05-24 3 NA NA NA 3 NA NA NA NA NA
# 5 2019-05-25 5 5 NA NA NA NA NA 5 NA NA
# 6 2019-05-26 6.5 6.5 6.5 NA 6.5 NA NA NA NA NA
# 7 2019-05-27 1 1 1 1 1 1 NA 1 NA NA
# 8 2019-05-28 9 NA NA NA 9 9 9 9 9 NA
# 9 2019-05-29 3 3 3 3 NA NA NA 3 3 3
#10 2019-05-30 2 2 NA NA 2 2 2 NA NA NA
#11 2019-05-31 4 NA NA NA 4 NA NA 4 4 4
#12 2019-06-01 7 7 NA NA NA NA NA 7 NA NA
#13 2019-06-02 2 2 2 2 2 NA NA NA NA NA