Question

我想创建一个来自Textmining with R web教科书的情节，但是我的数据。它基本上每年搜索最高项并绘制它们（图5.4：http://tidytextmining.com/dtm.html）。我的数据比他们开始使用的数据更清晰，但我是R的新手。我的数据有一个“日期”列，它是2016-01-01格式（它是一个日期类）。我只有2016年的数据，所以我想做同样的事情，但更细化（即按月或按日）

library(tidyr)

year_term_counts <- inaug_td %>%
extract(document, "year", "(\\d+)", convert = TRUE) %>%
complete(year, term, fill = list(count = 0)) %>%
group_by(year) %>%
mutate(year_total = sum(count))

year_term_counts %>%
filter(term %in% c("god", "america", "foreign", "union", "constitution", 
"freedom")) %>%
ggplot(aes(year, count / year_total)) +
geom_point() +
geom_smooth() +
facet_wrap(~ term, scales = "free_y") +
scale_y_continuous(labels = scales::percent_format()) +
ylab("% frequency of word in inaugural address")

我的想法是，我会从文本中选择我的特定词语，看看它们在几个月内是如何变化的。

谢谢！

Answer 1

如果您希望根据您已有的日期列查看较小的时间单位，我建议您查看lubridate中的floor_date()或round_date()函数。我们本书的特定章节与处理文档术语矩阵然后整理等相关联。您是否已经为您的数据获得了整洁的文本格式？如果是这样，那么你可以这样做：

date_counts <- tidy_text %>%
    mutate(date = floor_date(Date, unit = "7 days")) %>% # use whatever time unit you want here
    count(date, word) %>%
    group_by(date) %>%
    mutate(date_total = sum(n))

date_counts %>%
    filter(word %in% c("PUT YOUR LIST OF WORDS HERE")) %>%
    ggplot(aes(date, n / date_total)) +
    geom_point() +
    geom_smooth() +
    facet_wrap(~ word, scales = "free_y")

如何从列中提取月份

1 个答案: