我正在使用的数据集是按客户和月份计费的数据。最后,我想制作一个数据框,其中包含列名称的行数和月数的客户ID - 与原始数据集一样。但是,我希望这个新数据集包含虚拟变量,以确定客户是否已获得"那个月又名。他们之前从未收到过账单,那个月是他们第一次被收费。
这是一个可重现的例子,以及我现在写的循环:
set.seed(24)
example.data <- data.frame(
ID = sample(11:20),
Jan = sample(0:5, 10, replace = TRUE),
Feb = sample(0:5, 10, replace = TRUE),
Mar = sample(0:5, 10, replace = TRUE),
Apr = sample(0:5, 10, replace = TRUE)
)
gained.df.ex <- data.frame(example.data$ID)
## customers can't be gained in the first month
## there's no previous data to verify that this is the first time they've been billed, so all values are 0
gained.df.ex$Jan <- rep(0, length(example.data$ID)
## here's the loop that isn't working
for(i in 3:5){
new.month.dummy <- for (x in 1:length(gained.df.ex$example.data.ID)){
ifelse(example.data[x,i] == 0, new.month.dummy[x] <- 0, ifelse(sum(example.data[x,2:(i-1)]} == 0, new.month.dummy[x] <-1, new.month.dummy <- 0))
}
我确定通过申请可以做到这一点,但我不确定如何。
预期输出如下:
> example.data
Jan Feb Mar Apr
15 0 3 4 3
19 1 3 0 5
20 4 2 5 1
12 2 1 3 0
14 0 0 2 1
17 5 5 4 4
11 3 4 1 5
18 1 0 0 2
13 3 2 5 3
16 2 5 1 2
> gained.df.ex
Jan Feb Mar Apr
15 0 1 0 0
19 0 0 0 0
20 0 0 0 0
12 0 0 0 0
14 0 0 1 0
17 0 0 0 0
11 0 0 0 0
18 0 0 0 0
13 0 0 0 0
16 0 0 0 0
答案 0 :(得分:2)
我们可以尝试
div.row