R:使用GraphNEL,提取关键字的术语频率

时间:2016-05-19 05:27:03

标签: r text-mining text-extraction keyword-search phrases

我运行以下代码从原始数据文件中提取关键短语。虽然我能够成功完成,但我无法获得提取的关键字的频率或计数,这将有助于我理解关键字出现的排名,因为我正在使用GraphNEL。有什么方法可以让关键短语计数? TIA。

ConstructTextGraph <- function(n)
 { 
  word_graph <- new("graphNEL")
  i <- 1
  while (i < length(words) ) {
    if ( IsSelectedWord(words[i]) ) {                                   
      links <- GetWordLinks(i,n)                                
      if (links[1] != "") {                                     
        cat(i," ",words[i]," - ",paste(c(links),collapse=" "),"\n")
        if ( length(which(nodes(word_graph)==words[i]))==0  ) {     
          word_graph <- addNode(words[i],word_graph)
        }                                               

        for (j in 1:length(links)) {
          if ( length(which(nodes(word_graph)==links[j]))==0 ) {
            word_graph <- addNode(links[j],word_graph)
            word_graph <- addEdge(words[i],links[j],word_graph,1)
          } 
          else {
            if ( length(which(edges(word_graph,links[j])[[1]]==words[i]))>0 ) { 
              prev_edge_weight <-    as.numeric(edgeData(word_graph,words[i],links[j],"weight"))
              edgeData(word_graph,words[i],links[j],"weight") <- prev_edge_weight+1
            }
            else {
              word_graph <- addEdge(words[i],links[j],word_graph,1)
            }
          } 
        }
      }
    }
    i <- i+1
  }
  word_graph
}

如果需要更多信息,请与我们联系。

0 个答案:

没有答案