unexst_tokens无法使用tidytext包处理R中的向量

时间:2017-12-20 16:14:52

标签: r text-analysis tidytext

我想使用tidytext包创建一个包含'ngrams'的列。使用以下代码:

library(tidytext)

unnest_tokens(tbl = president_tweets,
              output =  bigrams,
              input = text,
              token = "ngrams", 
              n = 2) 

但是当我运行这个时,我收到以下错误消息:

error: unnest_tokens expects all columns of input to be atomic vectors (not lists)

我的text列包含大量推文,其行如下所示,属于类字符。

president_tweets$text <– c("The United States Senate just passed the biggest in history Tax Cut and Reform Bill. Terrible Individual Mandate (ObamaCare)Repealed. Goes to the House tomorrow morning for final vote. If approved, there will be a News Conference at The White House at approximately 1:00 P.M.", 
    "Congratulations to Paul Ryan, Kevin McCarthy, Kevin Brady, Steve Scalise, Cathy McMorris Rodgers and all great House Republicans who voted in favor of cutting your taxes!", 
    "A  story in the @washingtonpost that I was close to rescinding the nomination of Justice Gorsuch prior to confirmation is FAKE NEWS. I never even wavered and am very proud of him and the job he is doing as a Justice of the U.S. Supreme Court. The unnamed sources dont exist!", 
    "Stocks and the economy have a long way to go after the Tax Cut Bill is totally understood and appreciated in scope and size. Immediate expensing will have a big impact. Biggest Tax Cuts and Reform EVER passed. Enjoy, and create many beautiful JOBS!", 
    "DOW RISES 5000 POINTS ON THE YEAR FOR THE FIRST TIME EVER - MAKE AMERICA GREAT AGAIN!", 
    "70 Record Closes for the Dow so far this year! We have NEVER had 70 Dow Records in a one year period. Wow!"
    )

---------更新:----------

看起来sentimetrexploratory包导致了冲突。我没有这些我重新加载我的包,现在再次工作!

1 个答案:

答案 0 :(得分:0)

嗯,我无法重现你的问题。

library(tidytext)
library(dplyr)

president_tweets <- data_frame(text = c("The United States Senate just passed the biggest in history Tax Cut and Reform Bill. Terrible Individual Mandate (ObamaCare)Repealed. Goes to the House tomorrow morning for final vote. If approved, there will be a News Conference at The White House at approximately 1:00 P.M.", 
                                        "Congratulations to Paul Ryan, Kevin McCarthy, Kevin Brady, Steve Scalise, Cathy McMorris Rodgers and all great House Republicans who voted in favor of cutting your taxes!", 
                                        "A  story in the @washingtonpost that I was close to rescinding the nomination of Justice Gorsuch prior to confirmation is FAKE NEWS. I never even wavered and am very proud of him and the job he is doing as a Justice of the U.S. Supreme Court. The unnamed sources dont exist!", 
                                        "Stocks and the economy have a long way to go after the Tax Cut Bill is totally understood and appreciated in scope and size. Immediate expensing will have a big impact. Biggest Tax Cuts and Reform EVER passed. Enjoy, and create many beautiful JOBS!", 
                                        "DOW RISES 5000 POINTS ON THE YEAR FOR THE FIRST TIME EVER - MAKE AMERICA GREAT AGAIN!", 
                                        "70 Record Closes for the Dow so far this year! We have NEVER had 70 Dow Records in a one year period. Wow!"))


unnest_tokens(tbl = president_tweets,
              output =  bigrams,
              input = text,
              token = "ngrams", 
              n = 2) 
#> # A tibble: 205 x 1
#>    bigrams      
#>    <chr>        
#>  1 the united   
#>  2 united states
#>  3 states senate
#>  4 senate just  
#>  5 just passed  
#>  6 passed the   
#>  7 the biggest  
#>  8 biggest in   
#>  9 in history   
#> 10 history tax  
#> # ... with 195 more rows

tidytext的当前CRAN版本确实不允许列表列,但我们更改了列处理,以便GitHub上的开发版本现在支持列表列。您确定在数据框/ tibble中没有这些吗?所有列的数据类型是什么?它们中的任何一个是list类型吗?