如何从R中的单词列表构建字母树?

时间:2014-11-21 11:35:59

标签: r dictionary tree

我的问题很简单。我有很多单词,例如abbey,abbot,abbr,abide。

我想按如下方式构建一棵树:

Level 0                             A
                                    | 
Level 1                             B
                                  /   \
Level 2                         B       I
                              / | \     |
Level 3                     E   O   R   D   
                            |   |       |
Level 4                     Y   T       E

是否有一种简单的方法可以解析wordlist并在R中创建这样的结构?

非常感谢你的帮助!

此致 克里斯

2 个答案:

答案 0 :(得分:3)

这是一个基于igraph的解决方案,用部分单词标记图表的每个节点,以便终端节点以完整单词命名:

library(igraph)
library(stringr)

initgraph = function(){
    # create a graph with one empty-named node and no edges
    g=graph.empty(n=1)
    V(g)$name=""
    g
}


wordtree <- function(g=initgraph(),wordlist){
    for(word in wordlist){
        # turns "word" into c("w","wo","wor","word")
        subwords = str_sub(word, 1, 1:nchar(word))
        # make a graph long enough to hold all those sub-words plus start node
        subg = graph.lattice(length(subwords)+1,directed=TRUE)
        # set vertex nodes to start node plus sub-words
        V(subg)$name=c("",subwords)
        # merge *by name* into the existing graph
        g = graph.union(g, subg)
    }
    g
}

加载后,

g = wordtree(initgraph(), c("abbey","abbot","abbr","abide"))
plot(g)

word tree

您可以通过将其作为第一个参数传递给现有树来添加单词:

> g = wordtree(g,c("now","accept","answer","please"))
> plot(g)

树始终以名称为“”的节点为根,并且所有终端节点(没有传出边缘的节点)都有单词。 igraph中的函数可以在需要时将它们拉出来。当你完成它时,你还没有真正说出你想做什么......或者当我们为你完成它时:)

请注意,有一个很好的布局用于绘制树,看起来像你的ascii示例:

plot(g,layout=layout.reingold.tilford)

tree layout

答案 1 :(得分:1)

这是一个以递归方式构建嵌套列表的解决方案,其中字符为名称:

x <- c("abb", "abbey", "abbot", "abbr", "abide")

char.tree <- function(words, end = NULL) {
   first <- substr(words, 1, 1)
   rest  <- substr(words, 2, nchar(words))
   zi    <- nchar(words) == 0L 
   c(list(end)[any(zi)],
     lapply(split(rest[!zi], first[!zi]), char.tree, end = end))
}

str(char.tree(x))
# List of 1
#  $ a:List of 1
#   ..$ b:List of 2
#   .. ..$ b:List of 4
#   .. .. ..$  : NULL
#   .. .. ..$ e:List of 1
#   .. .. .. ..$ y:List of 1
#   .. .. .. .. ..$ : NULL
#   .. .. ..$ o:List of 1
#   .. .. .. ..$ t:List of 1
#   .. .. .. .. ..$ : NULL
#   .. .. ..$ r:List of 1
#   .. .. .. ..$ : NULL
#   .. ..$ i:List of 1
#   .. .. ..$ d:List of 1
#   .. .. .. ..$ e:List of 1
#   .. .. .. .. ..$ : NULL