R:可以添加汇总到dcast中的文件吗?

时间:2020-02-28 10:33:50

标签: r dplyr tidyr reshape2

R:可以添加在dcast中归档的摘要吗? 用户数与按月订购的数量。并在每年的摘要之间添加。

检查所附的屏幕截图(excel的)以获取预期结果。

User            Order_date          

a               02-01-2017          
b               02-02-2017          
a               02-08-2017          
c               02-05-2017          
a               02-08-2017          
s               02-06-2017          
c               02-03-2017          
s               02-04-2017          
b               02-06-2017          
c               02-11-2017          
a               02-11-2017          
s               02-11-2017          
c               02-01-2018          
s               02-01-2018          
b               02-02-2018          
b               02-10-2018          

尝试生成报告,但不显示在excel中显示的摘要值, 请检查代码和所附的屏幕截图。

library(data.table)
library(lubridate)

df$start_year_month <- format(df$Month_Due, "%Y-%m")
#dcast(setDT(df), user ~ factor(start_year_month, levels = 1:12), sum, drop = FALSE)

datatable(dcast(df, user ~ start_year_month), filter = 'top',fun.aggregate =  ???? )

Click here for Screenshot

1 个答案:

答案 0 :(得分:0)

您可以首先通过执行以下操作来每月计算不同用户的数量:

library(tidyr)
library(dplyr)
library(lubridate)
df <- df %>% mutate(Order_date = dmy(Order_date)) # Format the date from the reproducible example

DF <- df %>% arrange(Order_date) %>% mutate(months = format(Order_date, "%b_%Y")) %>%
  mutate(months = factor(months, unique(months))) %>%
  group_by(months, User) %>% count() 

# A tibble: 15 x 3
# Groups:   months, User [15]
   months   User      n
   <fct>    <chr> <int>
 1 Jan_2017 a         1
 2 Feb_2017 b         1
 3 Mar_2017 c         1
 4 Apr_2017 s         1
 5 May_2017 c         1
 6 Jun_2017 b         1
 7 Jun_2017 s         1
 8 Aug_2017 a         2
 9 Nov_2017 a         1
10 Nov_2017 c         1
11 Nov_2017 s         1
12 Jan_2018 c         1
13 Jan_2018 s         1
14 Feb_2018 b         1
15 Oct_2018 b         1

然后,您可以使用每年的计数创建第二个数据框:

DF_Year <- df %>% arrange(Order_date) %>% mutate(months = paste(format(Order_date, "%Y"),"_Total",sep = "")) %>%
  mutate(months = factor(months, unique(months))) %>%
  group_by(months, User) %>% count() 

# A tibble: 7 x 3
# Groups:   months, User [7]
  months User      n
  <fct>  <chr> <int>
1 2017   a         4
2 2017   b         2
3 2017   c         3
4 2017   s         3
5 2018   b         2
6 2018   c         1
7 2018   s         1

您可以绑定两个数据框:

DF_ALL <- bind_rows(DF, DF_Year)

最后,您可以将数据框转换为更大的格式,并根据年份对列进行排序:

DF_Final <- DF_ALL %>% pivot_wider(names_from = months, values_from = n) %>%
  select(contains("2017"),contains("2018")) 

# A tibble: 4 x 14
# Groups:   User [4]
  User  Jan_2017 Feb_2017 Mar_2017 Apr_2017 May_2017 Jun_2017 Aug_2017 Nov_2017 `2017_Total` Jan_2018 Feb_2018 Oct_2018
  <chr>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>        <int>    <int>    <int>    <int>
1 a            1       NA       NA       NA       NA       NA        2        1            4       NA       NA       NA
2 b           NA        1       NA       NA       NA        1       NA       NA            2       NA        1        1
3 c           NA       NA        1       NA        1       NA       NA        1            3        1       NA       NA
4 s           NA       NA       NA        1       NA        1       NA        1            3        1       NA       NA
# … with 1 more variable: `2018_Total` <int>

它回答了您的问题吗?


可复制的示例

structure(list(User = c("a", "b", "a", "c", "a", "s", "c", "s", 
"b", "c", "a", "s", "c", "s", "b", "b"), Order_date = structure(c(17168, 
17199, 17380, 17288, 17380, 17319, 17227, 17258, 17319, 17472, 
17472, 17472, 17533, 17533, 17564, 17806), class = "Date")), class = "data.frame", row.names = c(NA, 
-16L))