R中的球形k均值聚类

时间:2020-04-27 08:57:21

标签: r

我一直在尝试球面k均值聚类,但是有些词有多个单词。 我在下面的此链接上使用的文件 https://drive.google.com/file/d/1LpvbFe9EvcDMou60qrSk0ViAzQfNn1nB/view?usp=sharing

我已经在代码下方写了警告消息。这是我尝试过的代码:

options(stringsAsFactors = F)
set.seed(1234)
library(skmeans)
library(tm)
library(clue)
library(cluster)
library(fpc)
library(clue)
library(wordcloud)
#tahap preprocessing
clean.corpus <- function(corpus){
  corpus <- tm_map(corpus, content_transformer(tolower))
  corpus <- tm_map(corpus, removePunctuation)
  corpus <- tm_map(corpus, removeNumbers)
  corpus <- tm_map(corpus, removeWords,c(stopwords('en'), 'customer','service','customers','calls'))
  return(corpus)}
#import data
wk.exp<-read.csv(file='D:/KULIAH/SKRIPSI/Coba-R/text_mining-master/1yr_plus_final4.csv', header=T)
wk.source <- VCorpus(VectorSource(wk.exp$text))
wk.corpus<-clean.corpus(wk.source)

# spherical k-means Clustering
library(skmeans)
library(clue)
#recreate new DTM dengan pembobotan TFIDF
wk.dtm<-DocumentTermMatrix(wk.corpus,
                           control=list(weighting= weightTfIdf))
#perform spherical k-means clustering
soft.part <- skmeans(wk.dtm, 3, m = 1.2, control =
                       list(nruns = 5, verbose = T))
#tabel prototype score
s.clus.proto<-t(cl_prototypes(soft.part))
#review top n most prototype terms in each cluster
sort(s.clus.proto[,1],decreasing=T)[1:5]
sort(s.clus.proto[,2],decreasing=T)[1:5]
sort(s.clus.proto[,3],decreasing=T)[1:5]

Warning messages:
1: In comparison.cloud(s.clus.proto, max.words = 100, title.size = 1,  :
  multitasking  accomplishments satisfying could not be fit on page. It will not be plotted.
2: In comparison.cloud(s.clus.proto, max.words = 100, title.size = 1,  :
  back  accomplishments became could not be fit on page. It will not be plotted.

0 个答案:

没有答案