I am trying to create a dendrogram in r based off an excel sheet for use in text mining. I have one large column, each cell with a string of text. I want the smallest branch of the dendrogram to represent an individual cell, yet when I run my script I instead get a dendrogram of every word within the entire excel file. How do I fix this?
library(tm)
library(stringi)
library(proxy)
Data <- read.csv(file.choose(),header=TRUE)
docs <- Corpus(VectorSource(Data))
docs[[1]]
docs1 <- tm_map(docs, PlainTextDocument)
docs2 <- tm_map(docs1, stripWhitespace)
docs3 <- tm_map(docs2, removeWords, stopwords("english"))
docs4 <- tm_map(docs3, removePunctuation)
docs5 <- tm_map(docs4, content_transformer(tolower))
docs5[[1]]
TermMatrix <- TermDocumentMatrix(docs5)
docsdissim <- dist(as.matrix(TermMatrix), method = "euclidean")
docsdissim2 <- as.matrix(docsdissim)
docsdissim2
h <- hclust(docsdissim, method = "ward.D2")