我有一个这样的数据表:
customer_id account_id time count
1: 1 AAA 2000-01-01 0
2: 1 AAA 2000-02-01 1
3: 1 AAA 2000-03-01 2
4: 1 AAA 2000-04-01 3
5: 1 AAA 2000-05-01 4
6: 1 AAA 2000-06-01 5
7: 1 AAA 2000-07-01 6
8: 1 AAA 2000-08-01 7
9: 2 BBB 2008-01-01 0
10: 2 BBB 2008-02-01 1
11: 2 BBB 2008-03-01 2
12: 2 BBB 2008-04-01 3
13: 2 BBB 2008-05-01 4
14: 2 BBB 2008-06-01 5
15: 2 BBB 2008-07-01 6
16: 2 BBB 2008-08-01 7
17: 2 BBB 2008-09-01 8
18: 2 BBB 2008-10-01 9
19: 2 BBB 2008-11-01 10
20: 2 BBB 2008-12-01 11
21: 2 BBB 2009-01-01 12
22: 2 BBB 2009-02-01 13
23: 2 BBB 2009-03-01 14
24: 2 BBB 2009-04-01 15
用于创建此data.table的代码在此处:
customer_id <- c(rep(1,8), rep(2,16))
account_id <- c(rep("AAA",8), rep("BBB",16))
time <- c(seq(as.Date("2000/1/1"), by = "month", length.out = 8),
seq(as.Date("2008/1/1"), by = "month", length.out = 16))
count <- c(seq(from = 0, to = 7), seq(from = 0, to = 15))
my_data <- data.table(customer_id,account_id,time,count)
我想生成一个名为new_var的新变量,如果变量count
在1和4之间,则等于0;如果count
在5和8之间,则为2;如果{ {1}} t在9到12之间,依此类推。也就是说,通过coun
,customer_id
,我想创建一个新变量,该变量以1开头,每4个值后增加1数。看起来像这样:
account_id
对于等于0的计数,此新变量可以是例如NA,这无关紧要。有什么方法可以按组在此data.table中建立此序列(0,0,0,0,1,1,1,1,2,2,2,2,...)?
答案 0 :(得分:2)
这是一个dplyr
解决方案。 group_by
您的customer_id,然后只需在ifelse
中使用mutate
语句来生成新变量。
library(dplyr)
my_data %>% group_by(customer_id,account_id) %>% mutate(new_var = ifelse(count==0,NA,floor((count-1)/4)))
# A tibble: 24 x 5
# Groups: customer_id [2], account_id [1]
# customer_id account_id time count new_var
# <dbl> <chr> <date> <int> <dbl>
# 1 1 AAA 2000-01-01 0 NA
# 2 1 AAA 2000-02-01 1 0
# 3 1 AAA 2000-03-01 2 0
# 4 1 AAA 2000-04-01 3 0
# 5 1 AAA 2000-05-01 4 0
# 6 1 AAA 2000-06-01 5 1
# 7 1 AAA 2000-07-01 6 1
# 8 1 AAA 2000-08-01 7 1
# 9 2 BBB 2008-01-01 0 NA
#10 2 BBB 2008-02-01 1 0
# ... with 14 more rows
答案 1 :(得分:0)
这纯粹是'data.table'语法的解决方案:
my_data[, new_var:=ifelse(count==0, NA, floor((count-1)/4)), by=.(customer_id, account_id)]