在循环中引用数据表列名称中的数字

时间:2018-10-10 23:42:15

标签: r loops indexing data.table

我有一个像这样的数据集:

customer_id <- c("1","1","1","2","2","2","2","3","3","3")
account_id <- as.character(c(11,11,11,55,55,55,55,38,38,38))
time <- c(as.Date("2017-01-01","%Y-%m-%d"), as.Date("2017-02-01","%Y-%m-%d"), as.Date("2017-03-01","%Y-%m-%d"),
              as.Date("2017-12-01","%Y-%m-%d"), as.Date("2018-01-01","%Y-%m-%d"), as.Date("2018-02-01","%Y-%m-%d"),
              as.Date("2018-03-01","%Y-%m-%d"), as.Date("2018-04-01","%Y-%m-%d"), as.Date("2018-05-01","%Y-%m-%d"),
              as.Date("2018-06-01","%Y-%m-%d"))
tenor <- c(1,2,3,1,2,3,4,1,2,3)
variable_x <- c(87,90,100,120,130,150,12,13,15,14)

my_data <- data.table(customer_id,account_id,time,tenor,variable_x)

现在,我想创建一个新变量“ PD_Q1”到“ PD_Q20”,当“ tenor”等于1到20时,该变量等于“ variable_x”的值,即,PD_Q1等于variable_x的值期限= 1,如果期限= 2,PD_Q2等于variable_x的值,依此类推,我想通过customer_id,account_id来做到这一点。我有相应的代码,但是仅适用于PD_Q1,我想做一个循环,循环遍历i = 1:20,在此循环中,我只更改tenor == i(这一步很简单),并在此循环中引用PD_Qi列,这对我来说是个问题。 i的一个值的代码在这里:

my_data[tenor == 1, PD_Q1_temp := variable_x, by = c("customer_id", "account_id")]

list_accs <- my_data[tenor == 1, c("customer_id", "account_id", "PD_Q1_temp")]

list_accs <- unique(list_accs, by = c("customer_id", "account_id"))

names(list_accs) = c("customer_id", "account_id", "PD_Q1")

my_data = merge(x = my_data, y = list_accs, by = c("customer_id", "account_id"), all.x = TRUE)

my_data$PD_Q1_temp <- NULL

现在,您能否建议如何从1到20循环,从而使男高音PD_Q1_temp和PD_Q1发生变化?具体来说,我不知道如何在循环中使用此i索引来引用列名或变量。

i = 1和i = 2的预期输出(创建变量PD_Q1和PD_Q2)在这里:

> my_data
customer_id account_id       time tenor variable_x PD_Q1 PD_Q2

1:1 11 2017-01-01 1 87 87 90  2:1 11 2017-02-01 2 90 87 90  3:1 11 2017-03-01 3 100 87 90  4:2 55 2017年12月1日1 120 120 130  5:2 55 2018-01-01 2 130120130  6:2 55 2018-02-01 3 150120130  7:2 55 2018-03-01 4 12 120 130  8:3 38 2018-04-01 1 13 13 15  9:3 38 2018年5月1日2 15 13 15 10:3 38 2018-06-01 3 14 13 15

现在我想使用上面的代码在循环中创建PD_Q3,PD_Q4等,从而创建一个这样的变量。

1 个答案:

答案 0 :(得分:0)

您可以显示预期的输出吗?

我认为您可以使用tidyr::gather()做您想做的事情:

library(dplyr)
library(tidyr)

my_data %>%
  tbl_df() %>%
  select(-time) %>%
  mutate(tenor = paste0("PD_Q", tenor)) %>%
  spread(tenor, variable_x)

# # A tibble: 3 x 6
#   customer_id account_id PD_Q1 PD_Q2 PD_Q3 PD_Q4
#   <chr>       <chr>      <dbl> <dbl> <dbl> <dbl>
# 1 1           11            87    90   100    NA
# 2 2           55           120   130   150    12
# 3 3           38            13    15    14    NA