R:可以添加在dcast中归档的摘要吗? 用户数与按月订购的数量。并在每年的摘要之间添加。
检查所附的屏幕截图(excel的)以获取预期结果。
User Order_date
a 02-01-2017
b 02-02-2017
a 02-08-2017
c 02-05-2017
a 02-08-2017
s 02-06-2017
c 02-03-2017
s 02-04-2017
b 02-06-2017
c 02-11-2017
a 02-11-2017
s 02-11-2017
c 02-01-2018
s 02-01-2018
b 02-02-2018
b 02-10-2018
尝试生成报告,但不显示在excel中显示的摘要值, 请检查代码和所附的屏幕截图。
library(data.table)
library(lubridate)
df$start_year_month <- format(df$Month_Due, "%Y-%m")
#dcast(setDT(df), user ~ factor(start_year_month, levels = 1:12), sum, drop = FALSE)
datatable(dcast(df, user ~ start_year_month), filter = 'top',fun.aggregate = ???? )
答案 0 :(得分:0)
您可以首先通过执行以下操作来每月计算不同用户的数量:
library(tidyr)
library(dplyr)
library(lubridate)
df <- df %>% mutate(Order_date = dmy(Order_date)) # Format the date from the reproducible example
DF <- df %>% arrange(Order_date) %>% mutate(months = format(Order_date, "%b_%Y")) %>%
mutate(months = factor(months, unique(months))) %>%
group_by(months, User) %>% count()
# A tibble: 15 x 3
# Groups: months, User [15]
months User n
<fct> <chr> <int>
1 Jan_2017 a 1
2 Feb_2017 b 1
3 Mar_2017 c 1
4 Apr_2017 s 1
5 May_2017 c 1
6 Jun_2017 b 1
7 Jun_2017 s 1
8 Aug_2017 a 2
9 Nov_2017 a 1
10 Nov_2017 c 1
11 Nov_2017 s 1
12 Jan_2018 c 1
13 Jan_2018 s 1
14 Feb_2018 b 1
15 Oct_2018 b 1
然后,您可以使用每年的计数创建第二个数据框:
DF_Year <- df %>% arrange(Order_date) %>% mutate(months = paste(format(Order_date, "%Y"),"_Total",sep = "")) %>%
mutate(months = factor(months, unique(months))) %>%
group_by(months, User) %>% count()
# A tibble: 7 x 3
# Groups: months, User [7]
months User n
<fct> <chr> <int>
1 2017 a 4
2 2017 b 2
3 2017 c 3
4 2017 s 3
5 2018 b 2
6 2018 c 1
7 2018 s 1
您可以绑定两个数据框:
DF_ALL <- bind_rows(DF, DF_Year)
最后,您可以将数据框转换为更大的格式,并根据年份对列进行排序:
DF_Final <- DF_ALL %>% pivot_wider(names_from = months, values_from = n) %>%
select(contains("2017"),contains("2018"))
# A tibble: 4 x 14
# Groups: User [4]
User Jan_2017 Feb_2017 Mar_2017 Apr_2017 May_2017 Jun_2017 Aug_2017 Nov_2017 `2017_Total` Jan_2018 Feb_2018 Oct_2018
<chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 a 1 NA NA NA NA NA 2 1 4 NA NA NA
2 b NA 1 NA NA NA 1 NA NA 2 NA 1 1
3 c NA NA 1 NA 1 NA NA 1 3 1 NA NA
4 s NA NA NA 1 NA 1 NA 1 3 1 NA NA
# … with 1 more variable: `2018_Total` <int>
它回答了您的问题吗?
可复制的示例
structure(list(User = c("a", "b", "a", "c", "a", "s", "c", "s",
"b", "c", "a", "s", "c", "s", "b", "b"), Order_date = structure(c(17168,
17199, 17380, 17288, 17380, 17319, 17227, 17258, 17319, 17472,
17472, 17472, 17533, 17533, 17564, 17806), class = "Date")), class = "data.frame", row.names = c(NA,
-16L))