我有这个数据框:
> str(final)
'data.frame': 112 obs. of 3 variables:
$ FAO_CountryName: chr Algeria Egypt Libya Morocco ...
$ FAO_CountryURL : chr "http://www.fao.org/giews/countrybrief/country.jsp?code=DZA" "http://www.fao.org/giews/countrybrief/country.jsp?code=EGY" "http://www.fao.org/giews/countrybrief/country.jsp?code=LBY" "http://www.fao.org/giews/countrybrief/country.jsp?code=MAR" ...
$ Text : chr "\r\n Reference Date: 24-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 28-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 15-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 21-September-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ ...
我想以一种方式处理Text变量,例如,我可以计算一个单词逐行出现的次数。 换句话说,我想得到一个数据框如下:
> head(final, n=2)
FAO_CountryName FAO_CountryURL Text WordCount
Algeria http://www.fao.org… Algeria is nice… Algeria 1
is 1
...
Egypt http://www.fao.org… Egypt is nice too… Egypt 1
is 5
...
然而,我已经这样做了:
## Counting the words included in the textual dataset.
keywords <- text_df %>%
unnest_tokens(word, text) %>%
count(word, sort = TRUE) %>%
ungroup()
## Scoring the textual frequencies into the textual dataset (i.e. how many times the words are present)
total_words <- keywords %>%
group_by(word) %>%
summarize(total = sum(n))
然而,这样我只能获得所有列的字数,而不是ROW BY ROW。 有什么建议吗?