我正在获取每个日期的机票编号列表。日期列转换为数据,但票证编号列为文本。
Created Ticket
01-Jan-19 a1
02-Jan-19 a2
03-Jan-19 a3
04-Jan-19 a4
05-Jan-19 a5
06-Jan-19 a6
07-Jan-19 a7
08-Jan-19 a8
09-Jan-19 a9
10-Jan-19 a10
11-Jan-19 a11
12-Jan-19 a12
13-Jan-19 a13
14-Jan-19 a14
15-Jan-19 a15
16-Jan-19 a16
17-Jan-19 a17
18-Jan-19 a18
19-Jan-19 a19
01-Feb-19 a20
02-Feb-19 a21
03-Feb-19 a22
04-Feb-19 a23
试图在R中使用楼层数据,但是由于票号列是字符,所以我无法使用它。
data <- read.csv(file = 'D:\\DS Data\\SampleTickets.csv', stringsAsFactors = FALSE,header = TRUE)
str(data)
library(readr)
library(lubridate)
library(dplyr)
data <- data %>%
mutate(Created = dmy(Created))
data %>% group_by(month=floor_date(Created, "month")) %>%
summarize(amount=sum(Ticket))
我希望有一个数据帧输出。
CreatedMonth CountOfTickets
1/1/2019 18
1/2/2019 4
答案 0 :(得分:1)
您快要在那里了:只需使用n()
而不是sum(Ticket)
来计算行数:
library(dplyr)
library(lubridate)
data %>%
mutate(Created = dmy(Created)) %>%
group_by(month = floor_date(Created, "month")) %>%
summarize(amount = n())
# A tibble: 2 x 2 month amount <date> <int> 1 2019-01-01 19 2 2019-02-01 4
但是,有一个使用count()
的快捷方式:
data %>%
count(CreatedMonth = dmy(Created) %>% floor_date("month"))
# A tibble: 2 x 2 CreatedMonth n <date> <int> 1 2019-01-01 19 2 2019-02-01 4
为了完整起见,这也是data.table
版本:
library(lubridate)
library(data.table)
setDT(data)[, .N, by = .(CreatedMonth = floor_date(dmy(Created), "month"))]
CreatedMonth N 1: 2019-01-01 19 2: 2019-02-01 4
data <- readr::read_table("Created Ticket
01-Jan-19 a1
02-Jan-19 a2
03-Jan-19 a3
04-Jan-19 a4
05-Jan-19 a5
06-Jan-19 a6
07-Jan-19 a7
08-Jan-19 a8
09-Jan-19 a9
10-Jan-19 a10
11-Jan-19 a11
12-Jan-19 a12
13-Jan-19 a13
14-Jan-19 a14
15-Jan-19 a15
16-Jan-19 a16
17-Jan-19 a17
18-Jan-19 a18
19-Jan-19 a19
01-Feb-19 a20
02-Feb-19 a21
03-Feb-19 a22
04-Feb-19 a23")
答案 1 :(得分:0)
我们首先使用dplyr
将Created
列转换为实际日期,然后按每个月对它们进行分组,并对每个组的票数进行计数。
library(dplyr)
df %>%
mutate(Created = as.Date(Created, "%d-%b-%y")) %>%
arrange(Created) %>%
mutate(Yearmon = format(Created, "%B-%Y"),
Yearmon = factor(Yearmon, levels = unique(Yearmon))) %>%
group_by(Yearmon) %>%
summarise(count = n())
# Yearmon count
# <fct> <int>
#1 January-2019 19
#2 February-2019 4