我正在尝试进行一些基本的文本分析。安装'tidytext'软件包后,我尝试取消数据框的嵌套,但仍然出现错误。我认为我缺少一些软件包,但是我不确定如何确定哪个软件包。任何建议表示赞赏。
library(dplyr)
library(tidytext)
#Import data
text <- read.csv("TextSample.csv", stringsAsFactors=FALSE)
n= nrow(text)
text_df <- tibble(line = 1:n, text = text)
text_df %>%
unnest_tokens(word, text)
> is_corpus_df(corpus)中的错误:ncol(corpus)> = 2不是TRUE
dput:
structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
答案 0 :(得分:1)
您的列text
实际上是数据框text_df
中的一个数据框,因此您尝试将unnest_tokens()
应用于数据框,但仅当您将其应用于原子时才有效向量(字符,整数,双精度,逻辑等)。
要解决此问题,您可以执行以下操作:
library(dplyr)
library(tidytext)
text_df <- text_df %>%
mutate_all(as.character) %>%
unnest_tokens(word, text)
哪个给你:
# A tibble: 186 x 2
line word
<chr> <chr>
1 1 c
2 1 furloughs
3 1 students
4 1 do
5 1 not
6 1 have
7 1 their
8 1 books
9 1 or
10 1 needed
# ... with 176 more rows