单词术语矩阵

时间:2018-01-19 16:05:04

标签: r twitter tm tidy term-document-matrix

我很想从一些推文中创建一个Word矩阵,推文中的每个单词都必须是一个新变量,并且只对与推文中该文本对应的单词填充1

x <- data.frame("Tweet" = c("hi all","I need help"), "N" = 1, "Reaction" = c("Happy", "Sad"), stringsAsFactors = FALSE)

我很想粘贴输出,但不知道该怎么做,对不起

enter image description here

1 个答案:

答案 0 :(得分:0)

你可以这样做:

library(tm)

x <- data.frame("Tweet" = c("hi all","I need help"), "N" = 1, "Reaction" = c("Happy", "Sad"), stringsAsFactors = FALSE)

corp <- VCorpus(VectorSource(x$Tweet))
# adjust wordLengths, default is c(3, Inf)
dtm <- DocumentTermMatrix(corp, control = list(wordLengths = c(1, Inf)))
data.frame(Tweet = x$Tweet, as.matrix(dtm), Reaction = x$Reaction)

            Tweet all help hi i need Reaction
1      hi all   1    0  1 0    0    Happy
2 I need help   0    1  0 1    1      Sad