Question

我正在尝试进行一些基本的文本分析。安装'tidytext'软件包后，我尝试取消数据框的嵌套，但仍然出现错误。我认为我缺少一些软件包，但是我不确定如何确定哪个软件包。任何建议表示赞赏。

＃

library(dplyr)
library(tidytext)


#Import data  
  text <- read.csv("TextSample.csv", stringsAsFactors=FALSE)

  n= nrow(text)

  text_df <- tibble(line = 1:n, text = text)

   text_df %>%
    unnest_tokens(word, text)

> is_corpus_df（corpus）中的错误：ncol（corpus）> = 2不是TRUE

dput：

structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Answer 1

您的列text实际上是数据框text_df中的一个数据框，因此您尝试将unnest_tokens()应用于数据框，但仅当您将其应用于原子时才有效向量（字符，整数，双精度，逻辑等）。

要解决此问题，您可以执行以下操作：

library(dplyr)
library(tidytext)

text_df <- text_df %>% 
  mutate_all(as.character) %>% 
  unnest_tokens(word, text)

哪个给你：

# A tibble: 186 x 2
   line  word     
   <chr> <chr>    
 1 1     c        
 2 1     furloughs
 3 1     students 
 4 1     do       
 5 1     not      
 6 1     have     
 7 1     their    
 8 1     books    
 9 1     or       
10 1     needed   
# ... with 176 more rows

tidytext错误（is_corpus_df（corpus）中的错误：ncol（corpus）> = 2不是TRUE）

＃

1 个答案: