我有下面提到的数据框:
Date ID
2018-04-01 K-1
2018-04-01 K-1
2018-04-02 K-2
2018-04-02 K-2
2018-04-03 K-2
2018-04-04 K-3
2018-05-01 K-5
2018-05-01 K-5
2018-05-02 K-6
2018-05-02 K-7
通过使用上面的datafram我想要下面提到的两个矩阵,按日期分组:
New_DF1
Date Unique Count Duplicate_Count
2018-04-01 1 1
2018-04-02 1 1
2018-04-03 1 0
2018-04-04 1 0
2018-05-01 1 0
2018-05-02 2 0
New_DF2
Month Unique Count Duplicate_Count
May-18 4 2
Apr-18 3 0
我试过了:
DF%>%
group_by(Date) %>%
summarise(count = n_distinct(ID))
但它无法发挥作用。
答案 0 :(得分:0)
怎么样:
DF%>%
group_by(Date, ID) %>%
summarise(Unique_Count = n_distinct(ID),
Duplicate_Count = n())
答案 1 :(得分:0)
dplyr
:
library(dplyr)
New_DF1 <- DF %>%
group_by(Date) %>%
summarise(Unique_Count = n_distinct(ID),
Duplicate_Count = sum(table(ID)>1))
New_DF1
# # A tibble: 6 x 3
# Date Unique_Count Duplicate_Count
# <fctr> <int> <int>
# 1 2018-04-01 1 1
# 2 2018-04-02 1 1
# 3 2018-04-03 1 0
# 4 2018-04-04 1 0
# 5 2018-05-01 1 1
# 6 2018-05-02 2 0
New_DF2 <- New_DF1 %>%
group_by(month = format.Date(Date, "%b-%y")) %>%
summarize_at(2:3,sum)
New_DF2
# A tibble: 2 x 3
# month Unique_Count Duplicate_Count
# <chr> <int> <int>
# 1 Apr-18 4 2
# 2 May-18 3 1
使用基座R
:
New_DF1<- aggregate(ID ~ Date, DF, function(x) c(Unique_Count = length(unique(x)),
Duplicate_Count = sum(table(x)>1)))
New_DF1<- cbind(New_DF1[1],New_DF1[[2]])
New_DF1
# Date Unique_Count Duplicate_Count
# 1 2018-04-01 1 1
# 2 2018-04-02 1 1
# 3 2018-04-03 1 0
# 4 2018-04-04 1 0
# 5 2018-05-01 1 1
# 6 2018-05-02 2 0
New_DF2 <- New_DF1
New_DF2$month = format.Date(New_DF2$Date, "%b-%y")
New_DF2 <- aggregate(cbind(Unique_Count, Duplicate_Count) ~ month, New_DF2, sum)
New_DF2
# month Unique_Count Duplicate_Count
# 1 Apr-18 4 2
# 2 May-18 3 1