tidytext错误(is_corpus_df(corpus)中的错误:ncol(corpus)> = 2不是TRUE)

时间:2020-05-12 19:10:55

标签: r tidytext

我正在尝试进行一些基本的文本分析。安装'tidytext'软件包后,我尝试取消数据框的嵌套,但仍然出现错误。我认为我缺少一些软件包,但是我不确定如何确定哪个软件包。任何建议表示赞赏。

library(dplyr)
library(tidytext)


#Import data  
  text <- read.csv("TextSample.csv", stringsAsFactors=FALSE)

  n= nrow(text)

  text_df <- tibble(line = 1:n, text = text)

   text_df %>%
    unnest_tokens(word, text)

> is_corpus_df(corpus)中的错误:ncol(corpus)> = 2不是TRUE

dput:

structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

1 个答案:

答案 0 :(得分:1)

您的列text实际上是数据框text_df中的一个数据框,因此您尝试将unnest_tokens()应用于数据框,但仅当您将其应用于原子时才有效向量(字符,整数,双精度,逻辑等)。

要解决此问题,您可以执行以下操作:

library(dplyr)
library(tidytext)

text_df <- text_df %>% 
  mutate_all(as.character) %>% 
  unnest_tokens(word, text)

哪个给你:

# A tibble: 186 x 2
   line  word     
   <chr> <chr>    
 1 1     c        
 2 1     furloughs
 3 1     students 
 4 1     do       
 5 1     not      
 6 1     have     
 7 1     their    
 8 1     books    
 9 1     or       
10 1     needed   
# ... with 176 more rows