R根据两个不同的参数创建多个列

时间:2019-05-21 14:12:18

标签: r

我有一个数据框,其中有2列:日期和返回。现在,我要更改多个新列,这些列取决于两个参数:阈值参数和滞后参数。功能很简单。新列的计算方式如下:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 100, 1024)         5064704   
_________________________________________________________________
lstm_1 (LSTM)                (None, 4946)              118130264 
_________________________________________________________________
dropout_1 (Dropout)          (None, 4946)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 4946)              24467862  
=================================================================
Total params: 147,662,830
Trainable params: 147,662,830
Non-trainable params: 0
_________________________________________________________________

如果ValueError: Error when checking target: expected dense_1 to have shape (1,) but got array with shape (4945,) 高于阈值,则给我var= ifelse(lag(return, n= lag_day)>threshold,return, NA)) 值,否则给我lag(return)

以下是阈值和lag_days的值:

return

在这里,我正在手动解决问题:

NA

但是有没有一种解决方案可以使它更容易?也许有一个或两个套用功能?

这是我的示例数据框:

threshold=c(2,4,6)
lag_day=c(1,2,3)

2 个答案:

答案 0 :(得分:3)

一种选择是获取“阈值”,“ lag_day”与crossing的所有组合,然后遍历行(pmap),transmute以创建列的感兴趣并与原始数据集绑定。它使用base Rseq_along)中的一个功能

library(tidyverse)
crossing(threshold = seq_along(threshold), lag_day) %>%
    pmap_dfc(~  
             df %>%
               transmute(!! str_c("var_t", ..1, "_lag", ..2) := 
                  case_when(lag(return, n = ..2) > threshold[..1] ~ return, 
                            TRUE ~ NA_real_))) %>% 
   bind_cols(df, .)

答案 1 :(得分:2)

使用两个dplyr::lag的套用循环的基本R方法

df[paste0("var_t", outer(seq_along(lag_day), seq_along(threshold),
   FUN = paste, sep = "_"))] <-  do.call(cbind, 
     lapply(lag_day, function(x) sapply(threshold, function(y) 
            ifelse(dplyr::lag(df$return, n = x) > y, df$return, NA))))


#   date       return var_t1_1 var_t2_1 var_t3_1 var_t1_2 var_t2_2 var_t3_2 var_t1_3 var_t2_3 var_t3_3
#   <date>      <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
# 1 2019-05-21    1       NA       NA         NA     NA         NA       NA       NA       NA       NA
# 2 2019-05-22    2.5     NA       NA         NA     NA         NA       NA       NA       NA       NA
# 3 2019-05-23    2        2       NA         NA     NA         NA       NA       NA       NA       NA
# 4 2019-05-24    3       NA       NA         NA      3         NA       NA       NA       NA       NA
# 5 2019-05-25    5        5       NA         NA     NA         NA       NA        5       NA       NA
# 6 2019-05-26    6.5      6.5      6.5       NA      6.5       NA       NA       NA       NA       NA
# 7 2019-05-27    1        1        1          1      1          1       NA        1       NA       NA
# 8 2019-05-28    9       NA       NA         NA      9          9        9        9        9       NA
# 9 2019-05-29    3        3        3          3     NA         NA       NA        3        3        3
#10 2019-05-30    2        2       NA         NA      2          2        2       NA       NA       NA
#11 2019-05-31    4       NA       NA         NA      4         NA       NA        4        4        4
#12 2019-06-01    7        7       NA         NA     NA         NA       NA        7       NA       NA
#13 2019-06-02    2        2        2          2      2         NA       NA       NA       NA       NA