示例数据:
id <- c(1,1,1,2,2,3,4,4,5,5,6,6)
comm <- c("Called","Called","Emailed","Called", "Called","Emailed","Called", "Emailed","Called", "Emailed", "Other", "Other")
called <- c("Called", "Called", "Not Called", "Called", "Called", "Not Called", "Called","Not Called", "Called", "Not Called", "Not Called", "Not Called")
emailed <- c("Not Emailed","Not Emailed","Emailed", "Not Emailed", "Not Emailed", "Emailed","Not Emailed", "Emailed", "Not Emailed","Emailed", "Not Emailed", "Not Emailed")
returned <- c("Returned", "Returned", "Returned", "Not Returned", "Not Returned","Returned", "Not Returned", "Not Returned", "Not Returned", "Not Returned", "Not Returned", "Not Returned")
data <- data.table(id, comm, called, emailed, returned)
我已经尝试了以下方法(以及其他擦除方法):
table(data$called, data$emailed)
Emailed Not Emailed
Called 0 6
Not Called 4 2
但是问题在于,计算呼叫/电子邮件的次数。但是,我希望它计算被呼叫/通过电子邮件/两者的唯一帐户的数量,而不是拨打电话/电子邮件的数量。
答案 0 :(得分:2)
对于第一个表,您可以执行以下操作:
pvt <- data[, .(
called=if(any(comm=="Called")) "Called" else "Not Called",
emailed=if(any(comm=="Emailed")) "Emailed" else "Not Emailed"),
by=.(id)]
dcast(pvt, called ~ emailed, uniqueN, value.var="id")
输出:
called Emailed Not Emailed
1: Called 3 1
2: Not Called 1 1
对于第二张表,目前尚不清楚您如何处理与他人通话和通过电子邮件发送而仅回电的情况。如果您可以接受电话或电子邮件的回复,那么对于第二张表,类似这样的方法应该起作用:
pvt <- data[, .(
called=if(any(comm=="Called")) "Called" else "Not Called",
emailed=if(any(comm=="Emailed")) "Emailed" else "Not Emailed",
returned=if(any(returned=="Returned")) "Returned" else "Not Returned"),
by=.(id)]
dcast(pvt, called ~ emailed, function(x) sum(x=="Returned") / length(x),
value.var="returned")
输出:
called Emailed Not Emailed
1: Called 0.3333333 0
2: Not Called 1.0000000 0
答案 1 :(得分:0)
您可以为called
和emailed
的每种组合计算唯一ID的数量。我想这就是您想要的:
library(tidyr)
library(dplyr)
data %>%
group_by(called, emailed) %>%
summarise(n_id = n_distinct(id)) %>%
spread(key = emailed, value = n_id)
# A tibble: 2 x 3
# Groups: called [2]
called Emailed `Not Emailed`
<chr> <int> <int>
1 Called NA 4
2 Not Called 4 1
您也可以使用data.table
:
data[, .(n_id = uniqueN(id)), by = .(called, emailed)] %>%
spread(key = emailed, value = n_id)
called Emailed Not Emailed
1: Called NA 4
2: Not Called 4 1