如果我有这样的数据:
df = data.frame(person = c('jim','john','pam','jim'),
date =c('2018-01-01','2018-02-01','2018-03-01','2018-04-01'),
text = c('the lonely engineer','tax season is upon us, engineers, do your taxes!','i am so lonely','rage coding is the best') )
我希望按日期了解趋势条款,我该怎么办呢?
xCorp = corpus(df, text_field = 'text')
x = tokens(xCorp) %>% tokens_remove(
c(
stopwords('english'),
'western digital',
'wd',
'nil'),
padding = T
) %>%
dfm(
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = T,
concatenator = ' '
)
x2 = dfm(x, groups = 'date')
这会让我成为那里的一部分,但不确定它是不是最好的方式。
答案 0 :(得分:0)
使用tidyverse,我能够做到以下几点:
df = df %>%
group_by(date) %>%
unnest_tokens(word,text) %>%
count(word,sort = T) %>%
}