唯一ID的列联表

时间:2019-02-07 01:12:56

标签: r data.table

示例数据:

id <- c(1,1,1,2,2,3,4,4,5,5,6,6)
comm <- c("Called","Called","Emailed","Called", "Called","Emailed","Called", "Emailed","Called", "Emailed", "Other", "Other")
called <- c("Called", "Called", "Not Called", "Called", "Called", "Not Called", "Called","Not Called", "Called", "Not Called", "Not Called", "Not Called")
emailed <- c("Not Emailed","Not Emailed","Emailed", "Not Emailed", "Not Emailed", "Emailed","Not Emailed", "Emailed", "Not Emailed","Emailed", "Not Emailed", "Not Emailed")
returned <- c("Returned", "Returned", "Returned", "Not Returned", "Not Returned","Returned", "Not Returned", "Not Returned", "Not Returned", "Not Returned", "Not Returned", "Not Returned")
data <- data.table(id, comm, called, emailed, returned)

我想创建以下两个表: enter image description here

因此示例数据的表应如下所示: enter image description here

我已经尝试了以下方法(以及其他擦除方法):

table(data$called, data$emailed)
       Emailed Not Emailed
  Called           0           6
  Not Called       4           2

但是问题在于,计算呼叫/电子邮件的次数。但是,我希望它计算被呼叫/通过电子邮件/两者的唯一帐户的数量,而不是拨打电话/电子邮件的数量。

编辑以添加:我已经意识到我在第二张表应该是什么上的错误。这就是应该的样子。enter image description here

2 个答案:

答案 0 :(得分:2)

对于第一个表,您可以执行以下操作:

pvt <- data[, .(
        called=if(any(comm=="Called")) "Called" else "Not Called", 
        emailed=if(any(comm=="Emailed")) "Emailed" else "Not Emailed"), 
    by=.(id)]
dcast(pvt, called ~ emailed, uniqueN, value.var="id")

输出:

       called Emailed Not Emailed
1:     Called       3           1
2: Not Called       1           1

对于第二张表,目前尚不清楚您如何处理与他人通话和通过电子邮件发送而仅回电的情况。如果您可以接受电话或电子邮件的回复,那么对于第二张表,类似这样的方法应该起作用:

pvt <- data[, .(
        called=if(any(comm=="Called")) "Called" else "Not Called", 
        emailed=if(any(comm=="Emailed")) "Emailed" else "Not Emailed", 
        returned=if(any(returned=="Returned")) "Returned" else "Not Returned"),
    by=.(id)]
dcast(pvt, called ~ emailed, function(x) sum(x=="Returned") / length(x), 
    value.var="returned")

输出:

       called   Emailed Not Emailed
1:     Called 0.3333333           0
2: Not Called 1.0000000           0

答案 1 :(得分:0)

您可以为calledemailed的每种组合计算唯一ID的数量。我想这就是您想要的:

library(tidyr)
library(dplyr)

data %>%
  group_by(called, emailed) %>%
  summarise(n_id = n_distinct(id)) %>%
  spread(key = emailed, value = n_id)

# A tibble: 2 x 3
# Groups:   called [2]
  called     Emailed `Not Emailed`
  <chr>        <int>         <int>
1 Called          NA             4
2 Not Called       4             1

编辑:

您也可以使用data.table

data[, .(n_id = uniqueN(id)), by = .(called, emailed)] %>%
  spread(key = emailed, value = n_id)

       called Emailed Not Emailed
1:     Called      NA           4
2: Not Called       4           1