我的数据如下所示:
verkoop V621
verkoopcode V62123
verkoopcodenaam V6212355
verkoopdatum V621335
verkoopdatumchar V62133526
verkooppr V6216
verkoopprijs V62162
verkoopsafdeling V621213452
verkoopsartikel V62126324
现在,我想在R中创建一个树,如下所示:
V621 --> V62123 --> V6212355
--> V621335 --> V62133526
--> V6216 --> V62162
--> V621213452
--> V62126324
或类似的东西。这样他们就会考虑重叠的子串
答案 0 :(得分:2)
您可以使用minimum.spanning.tree
包中的igraph
函数来创建这样的树。
# load data
df <- read.table(text='verkoop V621
verkoopcode V62123
verkoopcodenaam V6212355
verkoopdatum V621335
verkoopdatumchar V62133526
verkooppr V6216
verkoopprijs V62162
verkoopsafdeling V621213452
verkoopsartikel V62126324')
# use igraph package
require(igraph)
# create adjacency matrix
adj <- nchar(sapply(df$V1, gsub, x=df$V1, replacement=''))
adj[!sapply(df$V1, grepl, x=df$V1)] <- 0
# name adjecency matrix
colnames(adj) <- df$V2
# original graph
gr <- graph.adjacency(adj, mode='directed', weighted=TRUE)
# minimum spanning tree
mst <- minimum.spanning.tree(gr)
# e.g. for graphical representation
plot(mst, vertex.size=40)