R:具有重叠字符串的树

时间:2014-10-06 09:07:44

标签: r string tree

我的数据如下所示:

verkoop          V621  
verkoopcode      V62123  
verkoopcodenaam  V6212355  
verkoopdatum     V621335  
verkoopdatumchar V62133526  
verkooppr        V6216  
verkoopprijs     V62162  
verkoopsafdeling V621213452  
verkoopsartikel  V62126324  

现在,我想在R中创建一个树,如下所示:

 V621   --> V62123  --> V6212355
        --> V621335 --> V62133526
        --> V6216 --> V62162
        --> V621213452
        --> V62126324

或类似的东西。这样他们就会考虑重叠的子串

1 个答案:

答案 0 :(得分:2)

您可以使用minimum.spanning.tree包中的igraph函数来创建这样的树。

# load data
df <- read.table(text='verkoop          V621  
verkoopcode      V62123  
verkoopcodenaam  V6212355  
verkoopdatum     V621335  
verkoopdatumchar V62133526  
verkooppr        V6216  
verkoopprijs     V62162  
verkoopsafdeling V621213452  
verkoopsartikel  V62126324')
# use igraph package
require(igraph)
# create adjacency matrix 
adj <- nchar(sapply(df$V1, gsub, x=df$V1, replacement=''))
adj[!sapply(df$V1, grepl, x=df$V1)] <- 0
# name adjecency matrix 
colnames(adj) <- df$V2
# original graph
gr <- graph.adjacency(adj, mode='directed', weighted=TRUE)
# minimum spanning tree 
mst <- minimum.spanning.tree(gr)
# e.g. for graphical representation
plot(mst, vertex.size=40)