计算R中一列的保留率

时间:2019-12-21 15:58:20

标签: r dplyr churn

当我在R中寻找正确的命令时,我需要您的建议。

基本上,我想计算特定客户的保留率。 customer_math是客户活动时间的快照,其中包括8年的时间范围。

customer  customer_math
Apple          1
Tesco          10
Nespresso      1001
Dell           11
BMW            11111100

最终数据集应如下所示:

customer  customer_math      retention_rate
Apple          1                1
Tesco          10               0.5
Nespresso      1001             0.5
Dell           11               1
BMW            11111100         0.75

关于如何解决问题的任何想法?

非常感谢您的帮助!谢谢!

3 个答案:

答案 0 :(得分:1)

您可以删除字符串中的所有0,计算nchar并将其除以总计nchar

df$retention_rate <- with(df, nchar(gsub('0', '', customer_math, fixed = TRUE))/
                              nchar(customer_math))
df
#   customer customer_math retention_rate
#1     Apple             1           1.00
#2     Tesco            10           0.50
#3 Nespresso          1001           0.50
#4      Dell            11           1.00
#5       BMW      11111100           0.75

数据

df <- structure(list(customer = structure(c(1L, 5L, 4L, 3L, 2L), 
.Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"), class = "factor"), 
customer_math = c(1L, 10L, 1001L, 11L, 11111100L)), class = "data.frame", 
row.names = c(NA, -5L))

答案 1 :(得分:0)

library(tidyverse)
tribble(
    ~customer, ~customer_math,
      "Apple",              1,
      "Tesco",             10,
  "Nespresso",           1001,
       "Dell",             11,
        "BMW",       11111100
  ) %>%
  mutate(active_count = str_count(customer_math, "1"),
         periods = str_length(customer_math),
         retention_rate = active_count / periods)

## A tibble: 5 x 5
#  customer  customer_math active_count periods retention_rate
#  <chr>             <dbl>        <int>   <int>          <dbl>
#1 Apple                 1            1       1           1   
#2 Tesco                10            1       2           0.5 
#3 Nespresso          1001            2       4           0.5 
#4 Dell                 11            2       2           1   
#5 BMW            11111100            6       8           0.75

答案 2 :(得分:0)

另一个实现预期结果的Base R解决方案:

# Coerce customer_math vector to a character type to enable 
# the string split, loop through each element: 

    df$retention_rate <- sapply(as.character(df$customer_math), 

           function(x){

             # Split each element up into a vector comrpised of
             # each of the characters: 

             elements_split <- unlist(strsplit(x, ""))

             # Divide the sum of each of these vectors by their length: 

             rr <- sum(as.numeric(elements_split))/length(elements_split)

             # Explicitly return the above vector: 

             return(rr)
      }
    )

数据:

df <- structure(
  list(
    customer = structure(
      c(1L, 5L, 4L, 3L, 2L),
      .Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"),
      class = "factor"
    ),
    customer_math = c(1L, 10L, 1001L, 11L, 11111100L)
  ),
  class = "data.frame",
  row.names = c(NA,-5L)
)