我试图以有效的方式在树(不一定是二叉树)中实现计数叶子。
我有一个像这样的边缘输入文件(有两个分支,因此有两个叶子):
# nodeID treeID parentID level M
8915218131 11994463802 9019194630 84 1152
8915218132 11994463802 9019194631 84 1018
9019194630 11994463802 9122013305 85 1152
9122013305 11994463802 9224080882 86 1152
9122013306 11994463802 9224080883 86 1009
9224080882 11994463802 9325374898 87 1153
9325374898 11994463802 9426419957 88 1153
9426419957 11994463802 9526231677 89 1153
9526231677 11994463802 9625289594 90 1153
9625289594 11994463802 9723827035 91 1154
9723827035 11994463802 9822081910 92 1154
9822081910 11994463802 9919855221 93 1154
9919855221 11994463802 10017373825 94 1154
10017373825 11994463802 10114414643 95 1154
10114414643 11994463802 10211212821 96 1154
10114414644 11994463802 10211212822 96 1000
10211212821 11994463802 10307531761 97 1154
10307531761 11994463802 10403632357 98 1154
10403632357 11994463802 10499245188 99 1154
10499245188 11994463802 10594618828 100 1154
10594618828 11994463802 10689537668 101 1154
10689537668 11994463802 10784230758 102 1154
10784230758 11994463802 10878482733 103 1154
10878482733 11994463802 10972591178 104 1154
10972591178 11994463802 11066270407 105 1154
11066270407 11994463802 11159769310 106 1154
11159769310 11994463802 11252883435 107 1154
11252883435 11994463802 11345851308 108 1154
11345851308 11994463802 11438596137 109 1153
11438596137 11994463802 11531494001 110 1153
11531494001 11994463802 11624604030 111 1153
11624604030 11994463802 11718806817 112 1153
11718806817 11994463802 11811844824 113 1152
11811844824 11994463802 11904133551 114 1141
11904133551 11994463802 11994463802 115 1139
11994463802 11994463802 -1 116 1140
如您所见,最后一行是根(总是116级)。在这种情况下有两片叶子。
我需要过滤M > 1100
的树(这里所有人都可以)并计算数百万棵这样的树的叶数。这使像NetworkX这样的图形库有点问题,因为构建树需要太长时间。
我目前计算叶子的方法是只计算没有任何边缘指向它们的节点。以下是我目前的代码。是否有更好的工具或技术?
我目前的Python代码:
import sys
from operator import itemgetter
Mmin = 1100
def handle(tree, treeid, Mfinal):
# sort by levelid, if input not already presorted
#tree.sort(itemgetter(3))
nleafs = 0
nonleaf = set()
for id, rootid, parentid, layerid, M in tree:
if M >= Mmin:
nonleaf.add(parentid)
if id not in nonleaf:
nleafs += 1
print(treeid, Mfinal, nleafs)
sys.stdout.flush()
Mfinal = None
lasttreeid = None
tree = []
for l in sys.stdin:
parts = l.strip().split()
id, treeid, parentid, levelid, M = parts
if lastrootid != treeid:
if Mfinal is not None and Mfinal > Mmin and snapnum == "116":
handle(tree, lasttreeid, Mfinal)
tree = []
lasttreeid = treeid
Mfinal = None
if parentid == '-1':
Mfinal = int(M)
tree.append((id, treeid, parentid, int(levelid), int(M)))
答案 0 :(得分:0)
收集所有节点并从父节点中减去它们就足够了。使用集合和设置差异操作最容易实现它。
def leaves(filename):
lasttreeid = None
nodes = set()
parents = set()
for l in open(filename):
nodeid, treeid, parentid, levelid, M = l.strip().split()
if lasttreeid is None:
lasttreeid = treeid
if lasttreeid != treeid:
print 'Tree %s, number of leaves %d' % (lasttreeid, len(nodes - parents))
lasttreeid = treeid
nodes = set()
parents = set()
nodes.add(nodeid)
parents.add(parentid)
print 'Tree %s, number of leaves %d' % (lasttreeid, len(nodes - parents))