Question

我有一个关于2008 - 2013年客户购买的日期CustOrder，其中包含以下信息（这只是部分数据）：

CustID  OrderYear  Amount
101102  2008       22429.00
101102  2009       11045.00
101435  2010       10740.77
101435  2011       73669.50
107236  2012       162123.50
101416  2010       8102.00
101416  2011       360.00
101416  2012       36576.00
101416  2013       1960.00
101467  2012       997.00
101604  2010       2971.53
101664  2009       91.94
101664  2011       130.93
.........

有些客户可能每年连续购买（即101416），或仅购买某些年份（即101664）。我想弄清楚客户获取率，即当年有多少新客户在费率和数量方面获得（对于没有连续购买的客户，只考虑第一次购买）。例如，

Year Customer  TotalCustomerNumber NewCustomerRate
2008   5          5                     0%
2009   3          8                     37%
2010   4          12                    33%
2011   2          14                    14%
2012   3          17                    17%
2013   2          19                    10%

任何人都有任何想法/提示怎么做？

我感谢任何帮助！

Answer 1

我花了一些时间来制定解决方案，这种方法应该有效。查看详细评论：

# Setting a seed for reproducibility.
set.seed(10)

# Setting what years we want allowed.
validYears <- 2008:2015

# Generating a "fake" dataset for testing purposes.
custDF <- data.frame(CustID = abs(as.integer(rnorm(250, 50, 50))), OrderYear = 0, Amount = abs(rnorm(250, 100, 1000)))
custDF$OrderYear <- sapply(custDF$OrderYear, function(x) x <- sample(validYears, 1)) # Adding random years for each purchase.

# Initializing a new data frame to store the output values.
newDF <- data.frame(Year = validYears, NewCustomers = 0, RunningNewCustomerTotal = 0, NewCustomerRate = "")
custTotal <- 0 # Initializing a variable to be used in the loop.
firstIt <- 1 # Denotes the first iteration.

for (year in validYears) { # For each uniqueYear in your data set (which I arbitarily defined before making the dataset)

  # Getting the unique IDs of the current year and the unique IDs of all past years.
  currentIDs <- unique(custDF[custDF$OrderYear == year, "CustID"])
  pastIDs <- unique(custDF[custDF$OrderYear < year, "CustID"])

  if (firstIt == 1) { pastIDs <- c(-1) } # Setting a condition for the first iteration.

  newIDs <- currentIDs[!(currentIDs %in% pastIDs)] # Getting all IDs that have not been previously used.
  numNewIDs <- length(newIDs) # Getting the number of new IDs.
  custTotal <- custTotal + numNewIDs # Getting the running total.

  # Adding the new data into the data frame.
  newDF[newDF$Year == year, "NewCustomers"] <- numNewIDs
  newDF[newDF$Year == year, "RunningNewCustomerTotal"] <- custTotal

  # Getting the rate.
  if (firstIt == 1) { 

    NewCustRate <- 0
    firstIt <- 2

  } else { NewCustRate <- (1 - (newDF[newDF$Year == (year - 1), "RunningNewCustomerTotal"] / custTotal)) * 100 }

  # Inputting the new data. Format and round are just getting the decimals down.
  newDF[newDF$Year == year, "NewCustomerRate"] <- paste0(format(round(NewCustRate, 2)), "%")

}

输出：

> newDF
  Year NewCustomers RunningNewCustomerTotal NewCustomerRate
1 2008           32                      32              0%
2 2009           22                      54             41%
3 2010           19                      73             26%
4 2011           14                      87             16%
5 2012            7                      94            7.4%
6 2013            3                      97            3.1%
7 2014            9                     106            8.5%
8 2015            5                     111            4.5%

希望这有帮助！

如何通过发现与前几年的重叠来计算客户获取率？

1 个答案: