我的问题很简单。我有很多单词,例如abbey,abbot,abbr,abide。
我想按如下方式构建一棵树:
Level 0 A | Level 1 B / \ Level 2 B I / | \ | Level 3 E O R D | | | Level 4 Y T E
是否有一种简单的方法可以解析wordlist并在R中创建这样的结构?
非常感谢你的帮助!
此致 克里斯
答案 0 :(得分:3)
这是一个基于igraph
的解决方案,用部分单词标记图表的每个节点,以便终端节点以完整单词命名:
library(igraph)
library(stringr)
initgraph = function(){
# create a graph with one empty-named node and no edges
g=graph.empty(n=1)
V(g)$name=""
g
}
wordtree <- function(g=initgraph(),wordlist){
for(word in wordlist){
# turns "word" into c("w","wo","wor","word")
subwords = str_sub(word, 1, 1:nchar(word))
# make a graph long enough to hold all those sub-words plus start node
subg = graph.lattice(length(subwords)+1,directed=TRUE)
# set vertex nodes to start node plus sub-words
V(subg)$name=c("",subwords)
# merge *by name* into the existing graph
g = graph.union(g, subg)
}
g
}
加载后,
g = wordtree(initgraph(), c("abbey","abbot","abbr","abide"))
plot(g)
得
您可以通过将其作为第一个参数传递给现有树来添加单词:
> g = wordtree(g,c("now","accept","answer","please"))
> plot(g)
树始终以名称为“”的节点为根,并且所有终端节点(没有传出边缘的节点)都有单词。 igraph
中的函数可以在需要时将它们拉出来。当你完成它时,你还没有真正说出你想做什么......或者当我们为你完成它时:)
请注意,有一个很好的布局用于绘制树,看起来像你的ascii示例:
plot(g,layout=layout.reingold.tilford)
答案 1 :(得分:1)
这是一个以递归方式构建嵌套列表的解决方案,其中字符为名称:
x <- c("abb", "abbey", "abbot", "abbr", "abide")
char.tree <- function(words, end = NULL) {
first <- substr(words, 1, 1)
rest <- substr(words, 2, nchar(words))
zi <- nchar(words) == 0L
c(list(end)[any(zi)],
lapply(split(rest[!zi], first[!zi]), char.tree, end = end))
}
str(char.tree(x))
# List of 1
# $ a:List of 1
# ..$ b:List of 2
# .. ..$ b:List of 4
# .. .. ..$ : NULL
# .. .. ..$ e:List of 1
# .. .. .. ..$ y:List of 1
# .. .. .. .. ..$ : NULL
# .. .. ..$ o:List of 1
# .. .. .. ..$ t:List of 1
# .. .. .. .. ..$ : NULL
# .. .. ..$ r:List of 1
# .. .. .. ..$ : NULL
# .. ..$ i:List of 1
# .. .. ..$ d:List of 1
# .. .. .. ..$ e:List of 1
# .. .. .. .. ..$ : NULL