如何使用for循环计算自上个月购买每个唯一ID以来的月份数?

时间:2019-08-11 03:20:57

标签: r for-loop

我想计算自购买以来的上个月以来的月份数。 我的dataframe就像这样:

df
id month purchases
1  1     3
1  2     0
1  3     0
1  4     1
2  1     1
2  2     0
2  3     3
2  4     1
omit 100 rows

我想使用for循环来获取像这样的数据帧:

id month purchases recency
1  1     3          NA
1  2     0          1
1  3     0          2
1  4     1          3
2  1     1          NA
2  2     0          1
2  3     3          2
2  4     1          1
omit 100 rows

2 个答案:

答案 0 :(得分:1)

recency获取purchases != 0是困难的部分。使用dplyr的一种方法可能是

library(dplyr)

df %>%
  group_by(id, group = cumsum(purchases != 0)) %>%
  mutate(recency = month - first(month)) %>%
  ungroup() %>%
  select(-group) %>%
  group_by(id) %>%
  mutate(recency = ifelse(recency == 0, lag(recency) + month - lag(month), recency))

#     id month purchases recency
#  <int> <int>     <int>   <int>
#1     1     1         3      NA
#2     1     2         0       1
#3     1     3         0       2
#4     1     4         1       3
#5     2     1         1      NA
#6     2     2         0       1
#7     2     3         3       2
#8     2     4         1       1

为更好地说明问题,我们首先使用group_by idpurchases != 0,并为每个组创建recency列,方法是将month减去first(month)每个给予的

df %>%
  group_by(id, group = cumsum(purchases != 0)) %>%
  mutate(recency = month - first(month))

#   id month purchases group recency
#  <int> <int>     <int> <int>   <int>
#1     1     1         3     1       0
#2     1     2         0     1       1
#3     1     3         0     1       2
#4     1     4         1     2       0
#5     2     1         1     3       0
#6     2     2         0     3       1
#7     2     3         3     4       0
#8     2     4         1     5       0

这几乎是我们想要的,只是对于相同的id,其中purchases != 0需要减去最近的非0值,这是通过使用另一个group_by {{1 }}和id

答案 1 :(得分:1)

我看到您想要一个带for循环的答案。这是一个:

<div id="myDiv"></div>

要使用我们刚创建的此功能,请在您的df上使用类似以下的内容:

months_since_last_purchase <- function(df) {

  df$recency <- NA           # create an empty vector to store recency
  months_since = 0           # initialise our months since counter to zero

  for(row in 1:nrow(df)){    # loop through our rows

    if(df$purchases[row] == 0){  # if we did not purchase something this month

      months_since = months_since + 1   # increment months_since
      df$recency[row] <- months_since   # set the recency to months since

    } else {                     # else if we did purchase something this month

      months_since = months_since + 1   # increment months_since
      if(months_since == 1){   #     and if we purchased something last month as well
        df$recency[row] = NA   #         set the recency to NA
      }else{                   #     else we didn't purchase something last month
        df$recency[row] <- months_since    # set the recency to the months_since
      }
      months_since = 0         # reset the months since to zero

    }
  }
  df                           # return the modified dataframe
}

如果我打算重复使用此功能,则将其保存在某个位置,例如称为脚本的目录,然后重复使用:

new_df <- months_since_last_purchase(df)

输出:

source("scripts/months_since_last_purchase.R") 

R常常不赞成for循环,因为矢量操作更快,更优雅,但是当速度不重要时,我仍然觉得for循环很方便。