我从消费者投诉数据库中子集了一组数据。但是,我很难将其转换为时间序列,特别是因为在同一时间范围内报告了相同的问题(不是唯一的)。我的最终目标是将问题的发生频率与线条图中按月组织的时间范围进行比较。
以下是grades
子集中的前5行,总共有750,000多个条目:
data.frame
答案 0 :(得分:1)
像这样吗?
df <- data.frame(stringsAsFactors=FALSE,
Date = sample(c("08/25/14", "04/20/17", "02/14/14", "08/30/13", "10/03/2014",
"1/07/2013"), 100, replace = TRUE),
Issue = sample(c("Making/receiving", "Other", "Billing", "Managing", "Billing",
"Billing"), 100, replace = TRUE)
)
library(lubridate)
library(dplyr)
library(ggplot2)
df <- df %>%
mutate(
Date = mdy(Date),
Year = year(Date),
Month = month(Date),
Period = make_date(Year, Month, 1)
) %>%
group_by(Period, Issue) %>%
summarise(
incidents = n()
)
ggplot() +
geom_path(data = df, mapping = aes(x = Period, y = incidents, colour = Issue))
由reprex package(v0.3.0)于2019-11-19创建