Question

我有一个包含2列的索赔文件：“客户ID”，“声明日期”。

我希望看到（并统计）客户是否在X时间内涉及多起事故（假设一年）。

我的数据如下：

Customer_Id     Declaration_date   
001             12/10/2017
001             12/10/2017
002             24/10/2017
003             25/10/2017
004             25/10/2017
001             05/12/2017
006             07/12/2017

这是：

D <- data.frame(Customer_Id = c(001, 001, 002, 003, 004, 001, 006),
            Declaration_date = as.Date(c("12/10/2017", "12/10/2017", "24/10/2017", "25/10/2017", "25/10/2017", "05/12/2017", "07/12/2017"), format = "%d/%m/%Y"))

在这里，我们可以看到客户“001”在12月10日有两个索赔，但在05/12也有一个索赔。因此，我想要的是第三栏，根据客户自2016年1月1日起的日期计算不同索赔的数量。输出应如下所示：

Customer_Id     Declaration_date     Number of claims 
001             12/10/2017           2
001             12/10/2017           2
002             24/10/2017           1
003             25/10/2017           1
004             25/10/2017           1
001             05/12/2017           2
006             07/12/2017           1

请注意，在同一日期拥有多次客户ID不应与“索赔数量”相加。在我的例子中，客户001有“2”索赔，因为他在12月10日有一个（或多个）索赔，但也在05/12。

非常感谢任何帮助。

非常感谢，

Answer 1

我们可以使用ave中的base R通过获取＆＃39; Declaration_date＆＃39;

的长of个唯一`元素来创建列

with(D, ave(as.numeric(Declaration_date), Customer_Id, FUN = function(x) length(unique(x))))

或dplyr

library(dplyr)
D %>%
  group_by(Customer_Id) %>%
  mutate(Number_of_claims = n_distinct(Declaration_date))

或使用data.table

library(data.table)
setDT(D)[,  Number_of_claims := uniqueN(Declaration_date), Customer_Id]

R - 每个ID和日期的计数观察

1 个答案: