Question

I am trying to create a dendrogram in r based off an excel sheet for use in text mining. I have one large column, each cell with a string of text. I want the smallest branch of the dendrogram to represent an individual cell, yet when I run my script I instead get a dendrogram of every word within the entire excel file. How do I fix this?

library(tm)
library(stringi)
library(proxy)
Data <- read.csv(file.choose(),header=TRUE)
docs <- Corpus(VectorSource(Data))

docs[[1]]

docs1 <- tm_map(docs, PlainTextDocument)
docs2 <- tm_map(docs1, stripWhitespace)
docs3 <- tm_map(docs2, removeWords, stopwords("english"))
docs4 <- tm_map(docs3, removePunctuation)
docs5 <- tm_map(docs4, content_transformer(tolower))

docs5[[1]]

TermMatrix <- TermDocumentMatrix(docs5)
docsdissim <- dist(as.matrix(TermMatrix), method = "euclidean")
docsdissim2 <- as.matrix(docsdissim)
docsdissim2

h <- hclust(docsdissim, method = "ward.D2")

Dendrogram for Text Mining in R

0 个答案: