如何创建一个数据框架,将每日票务编号汇总为每月计数?

时间:2019-06-20 15:30:59

标签: r

我正在获取每个日期的机票编号列表。日期列转换为数据,但票证编号列为文本。

Created       Ticket
01-Jan-19   a1
02-Jan-19   a2
03-Jan-19   a3
04-Jan-19   a4
05-Jan-19   a5
06-Jan-19   a6
07-Jan-19   a7
08-Jan-19   a8
09-Jan-19   a9
10-Jan-19   a10
11-Jan-19   a11
12-Jan-19   a12
13-Jan-19   a13
14-Jan-19   a14
15-Jan-19   a15
16-Jan-19   a16
17-Jan-19   a17
18-Jan-19   a18
19-Jan-19   a19
01-Feb-19   a20
02-Feb-19   a21
03-Feb-19   a22
04-Feb-19   a23

试图在R中使用楼层数据,但是由于票号列是字符,所以我无法使用它。

data <- read.csv(file = 'D:\\DS Data\\SampleTickets.csv', stringsAsFactors = FALSE,header = TRUE)

str(data)
library(readr)
library(lubridate)
library(dplyr)

data <- data %>%
  mutate(Created = dmy(Created))

data %>% group_by(month=floor_date(Created, "month")) %>%
  summarize(amount=sum(Ticket))

我希望有一个数据帧输出。

CreatedMonth     CountOfTickets
1/1/2019             18
1/2/2019              4

2 个答案:

答案 0 :(得分:1)

您快要在那里了:只需使用n()而不是sum(Ticket)来计算行数:

library(dplyr)
library(lubridate)
data %>%
  mutate(Created = dmy(Created)) %>%
  group_by(month = floor_date(Created, "month")) %>%
  summarize(amount = n())
# A tibble: 2 x 2
  month      amount
  <date>      <int>
1 2019-01-01     19
2 2019-02-01      4

但是,有一个使用count()的快捷方式:

data %>% 
  count(CreatedMonth = dmy(Created) %>% floor_date("month"))
# A tibble: 2 x 2
  CreatedMonth     n
  <date>       <int>
1 2019-01-01      19
2 2019-02-01       4

为了完整起见,这也是data.table版本:

library(lubridate)
library(data.table)
setDT(data)[, .N, by = .(CreatedMonth = floor_date(dmy(Created), "month"))]
   CreatedMonth  N
1:   2019-01-01 19
2:   2019-02-01  4

数据

data <- readr::read_table("Created       Ticket
01-Jan-19   a1
02-Jan-19   a2
03-Jan-19   a3
04-Jan-19   a4
05-Jan-19   a5
06-Jan-19   a6
07-Jan-19   a7
08-Jan-19   a8
09-Jan-19   a9
10-Jan-19   a10
11-Jan-19   a11
12-Jan-19   a12
13-Jan-19   a13
14-Jan-19   a14
15-Jan-19   a15
16-Jan-19   a16
17-Jan-19   a17
18-Jan-19   a18
19-Jan-19   a19
01-Feb-19   a20
02-Feb-19   a21
03-Feb-19   a22
04-Feb-19   a23")

答案 1 :(得分:0)

我们首先使用dplyrCreated列转换为实际日期,然后按每个月对它们进行分组,并对每个组的票数进行计数。

library(dplyr)
df %>%
  mutate(Created = as.Date(Created, "%d-%b-%y")) %>%
  arrange(Created) %>%
  mutate(Yearmon = format(Created, "%B-%Y"), 
         Yearmon = factor(Yearmon, levels = unique(Yearmon))) %>%
  group_by(Yearmon) %>%
  summarise(count = n())


# Yearmon       count
#  <fct>         <int>
#1 January-2019     19
#2 February-2019     4