计算R中的响应率

时间:2015-01-12 01:15:02

标签: r

我在R中有一个数据表,例如:

enter image description here

id列是唯一的customerid。我想要做的是按分段因子计算响应率列。

我如何执行诸如count(unique paymentid)/count(unique id)之类的功能,以支付paymentid中的NA?

我希望我的结果表看起来像:

enter image description here

非常感谢提前!

3 个答案:

答案 0 :(得分:4)

可能有一种更优雅的方式来做到这一点,但这里有一个选项(data.table稍微过时的版本:

library(data.table)
library(scales)
##
setDT(Df)
##
Df2 <- Df[
  ,list(
    NumberSent=.N,
   NumberResponded=length(
     unique(na.omit(paymentid)))),
  by=segment][,ResponseRate:=percent(
    NumberResponded/NumberSent)]
##
R> Df2
   segment NumberSent NumberResponded ResponseRate
1:       Y          2               1        50.0%
2:       R          2               2       100.0%
3:       B          3               2        66.7%

scales仅适用于函数percent


数据:

Df <- data.frame(
  id=1:7,
  segment=rep(c("Y","R","B"),c(2,2,3)),
  paymentamount=c(10,NA,20,15,12,13,NA),
  paymentid=c(11,NA,12,13,14,15,NA))

答案 1 :(得分:2)

有几种方法可以做到这一点。以下是使用by的方法,使用dplyr的方法:

d <- data.frame(segment=c('Y', 'Y', 'R', 'R', 'B', 'B', 'B'),
                paymentamount=c(10, NA, 20, 15, 12, 13, NA))

do.call(rbind, 
        by(d$paymentamount, d$segment, function(x) {
          sent <- length(x)
          responded <- sum(!is.na(x))
          cbind.data.frame(sent, responded, rate=round(responded/sent*100))
        }))

#   sent responded rate
# B    3         2   67
# R    2         2  100
# Y    2         1   50

<强> dplyr

library(dplyr)
d %>% group_by(segment) %>%
  summarise(sent=length(paymentamount), 
            responded=sum(!is.na(paymentamount)),
            rate=round(responded/sent*100, 2))

# Source: local data frame [3 x 4]
# 
#   segment sent responded   rate
# 1       B    3         2  66.67
# 2       R    2         2 100.00
# 3       Y    2         1  50.00

答案 2 :(得分:0)

这里我使用了dplyr

d <- data.frame(segment=c('Y', 'Y', 'R', 'R', 'B', 'B', 'B'), paymentamount=c(10, NA, 20, 15, 12, 13, NA))

require(dplyr)

x <- d %>%
  group_by(segment) %>%
  summarize(NumberSent = n(), NumberResponded = sum(!is.na(paymentamount)), 
    ResponseRate = paste(round(100*(NumberResponded/NumberSent),0),"%", sep="") ) %>%
  arrange(desc(segment))