我想计算自购买以来的上个月以来的月份数。
我的dataframe
就像这样:
df
id month purchases
1 1 3
1 2 0
1 3 0
1 4 1
2 1 1
2 2 0
2 3 3
2 4 1
omit 100 rows
我想使用for循环来获取像这样的数据帧:
id month purchases recency
1 1 3 NA
1 2 0 1
1 3 0 2
1 4 1 3
2 1 1 NA
2 2 0 1
2 3 3 2
2 4 1 1
omit 100 rows
答案 0 :(得分:1)
为recency
获取purchases != 0
是困难的部分。使用dplyr
的一种方法可能是
library(dplyr)
df %>%
group_by(id, group = cumsum(purchases != 0)) %>%
mutate(recency = month - first(month)) %>%
ungroup() %>%
select(-group) %>%
group_by(id) %>%
mutate(recency = ifelse(recency == 0, lag(recency) + month - lag(month), recency))
# id month purchases recency
# <int> <int> <int> <int>
#1 1 1 3 NA
#2 1 2 0 1
#3 1 3 0 2
#4 1 4 1 3
#5 2 1 1 NA
#6 2 2 0 1
#7 2 3 3 2
#8 2 4 1 1
为更好地说明问题,我们首先使用group_by
id
和purchases != 0
,并为每个组创建recency
列,方法是将month
减去first(month)
每个给予的
df %>%
group_by(id, group = cumsum(purchases != 0)) %>%
mutate(recency = month - first(month))
# id month purchases group recency
# <int> <int> <int> <int> <int>
#1 1 1 3 1 0
#2 1 2 0 1 1
#3 1 3 0 1 2
#4 1 4 1 2 0
#5 2 1 1 3 0
#6 2 2 0 3 1
#7 2 3 3 4 0
#8 2 4 1 5 0
这几乎是我们想要的,只是对于相同的id
,其中purchases != 0
需要减去最近的非0值,这是通过使用另一个group_by
{{1 }}和id
。
答案 1 :(得分:1)
我看到您想要一个带for循环的答案。这是一个:
<div id="myDiv"></div>
要使用我们刚创建的此功能,请在您的df上使用类似以下的内容:
months_since_last_purchase <- function(df) {
df$recency <- NA # create an empty vector to store recency
months_since = 0 # initialise our months since counter to zero
for(row in 1:nrow(df)){ # loop through our rows
if(df$purchases[row] == 0){ # if we did not purchase something this month
months_since = months_since + 1 # increment months_since
df$recency[row] <- months_since # set the recency to months since
} else { # else if we did purchase something this month
months_since = months_since + 1 # increment months_since
if(months_since == 1){ # and if we purchased something last month as well
df$recency[row] = NA # set the recency to NA
}else{ # else we didn't purchase something last month
df$recency[row] <- months_since # set the recency to the months_since
}
months_since = 0 # reset the months since to zero
}
}
df # return the modified dataframe
}
如果我打算重复使用此功能,则将其保存在某个位置,例如称为脚本的目录,然后重复使用:
new_df <- months_since_last_purchase(df)
输出:
source("scripts/months_since_last_purchase.R")
R常常不赞成for循环,因为矢量操作更快,更优雅,但是当速度不重要时,我仍然觉得for循环很方便。