链接目标和源值

时间:2017-03-10 10:59:28

标签: python

我试图在Excel中迭代一个列表(见下文),其中包含源ID和目标ID值。当以图形方式表示下面的列表时,会得到一个树状结构,其中每个点都连接到另一个点。有一个分裂事件,之后出现一个点两个目标点。我想找到一种方法从该列表中收集连接的源和目标ID。

有没有人知道如何处理这个问题,或者可以给我一些关于可能的解决方案的提示?

SPOT_SOURCE_ID  SPOT_TARGET_ID
127466  127460
127460  127450
127450  127474
127450  127442
127474  127481
127442  127432
127481  127487
127432  127426
127426  127420
127487  127498
127420  127410
127498  127510
127510  127516
127410  127402
127516  127530
127530  127542
127402  127390
127542  127554
127390  127383
127554  127560

1 个答案:

答案 0 :(得分:1)

您的对列表可以被视为有根有根tree的边缘。查找所有路径的标准方法是执行pre-order depth-first search。我们可以通过调整来恢复您想要的曲目。下面的代码包含一个生成完整路径的递归生成器tree_paths以及一个生成轨道的递归生成器tree_trackstree_tracks返回一个整数depth以及每个曲目列表,depth用于以结构化方式打印曲目。

data = '''\
127466  127460
127460  127450
127450  127474
127450  127442
127474  127481
127442  127432
127481  127487
127432  127426
127426  127420
127487  127498
127420  127410
127498  127510
127510  127516
127410  127402
127516  127530
127530  127542
127402  127390
127542  127554
127390  127383
127554  127560
'''.splitlines()

# Convert multiline `data` string to a list of (parent, child) tuples 
edges = [tuple(int(u) for u in row.split()) for row in data]

# Each node in a tree can only have one parent. `v` is the parent of `k`
parents = {k: v for v, k in edges}

# Each value in `children` is a list containing the children of the key
children = {}
for u, v in edges:
    children.setdefault(u, []).append(v)

# Recursively generate every path in the tree starting at `node` 
# by performing a depth-first search
def tree_paths(node, head):
    newhead = head + [node]
    if node not in children:
        yield newhead
        return
    descendants = children[node]
    for n in descendants:
        yield from tree_paths(n, newhead)

# Recursively generate every track in the tree starting at `node` 
# by performing a depth-first search
def tree_tracks(node, head, depth=0):
    newhead = head + [node]
    if node not in children:
        yield newhead, depth
        return
    descendants = children[node]
    if len(descendants) > 1:
        yield newhead, depth 
        newhead = []
        depth += 1
    for n in descendants:
        yield from tree_tracks(n, newhead, depth)

# Find the root node.
# Start at any node. If the edges are sorted, `edges[0][0]` will be the root.
k = edges[0][0]
# Loop until we find a node without a parent. 
# That node must be the root of the tree 
while k in parents:
    k = parents[k]
root = k

print('Paths')
for seq in tree_paths(root, []):
    print(seq)

print('\nTracks')
for seq, depth in tree_tracks(root, []):
    print('{}{}'.format(' ' * 4 * depth, seq))

<强>输出

Paths
[127466, 127460, 127450, 127474, 127481, 127487, 127498, 127510, 127516, 127530, 127542, 127554, 127560]
[127466, 127460, 127450, 127442, 127432, 127426, 127420, 127410, 127402, 127390, 127383]

Tracks
[127466, 127460, 127450]
    [127474, 127481, 127487, 127498, 127510, 127516, 127530, 127542, 127554, 127560]
    [127442, 127432, 127426, 127420, 127410, 127402, 127390, 127383]

如果您已经知道根节点,那么显然可以省略parents dict的构造以及搜索根节点的循环。

如果您不需要曲目的缩进输出,则可以使用不使用或生成tree_tracks的更简单的depth版本。

def tree_tracks(node, head):
    newhead = head + [node]
    if node not in children:
        yield newhead
        return
    descendants = children[node]
    if len(descendants) > 1:
        yield newhead 
        newhead = []
    for n in descendants:
        yield from tree_tracks(n, newhead)

此代码可以处理更复杂的树。这是一个示例运行,它在树数据中添加了一些额外的分支。

data = '''\
127466  127460
127460  127450
127450  127474
127450  127442
127474  127481
127442  127432
127481  127487
127432  127426
127426  127420
127487  127498
127420  127410
127498  127510
127510  127516
127410  127402
127516  127530
127530  127542
127402  127390
127542  127554
127390  127383
127554  127560
127510 1
1 2
2 3
3 4
127516 11
11 12
12 13
'''.splitlines()

<强>输出

Paths
[127466, 127460, 127450, 127474, 127481, 127487, 127498, 127510, 127516, 127530, 127542, 127554, 127560]
[127466, 127460, 127450, 127474, 127481, 127487, 127498, 127510, 127516, 11, 12, 13]
[127466, 127460, 127450, 127474, 127481, 127487, 127498, 127510, 1, 2, 3, 4]
[127466, 127460, 127450, 127442, 127432, 127426, 127420, 127410, 127402, 127390, 127383]

Tracks
[127466, 127460, 127450]
    [127474, 127481, 127487, 127498, 127510]
        [127516]
            [127530, 127542, 127554, 127560]
            [11, 12, 13]
        [1, 2, 3, 4]

FWIW,这是我用来生成数据下面树形图的Graphviz DOT文件。

strict digraph test{
    127466 -> 127460;
    127460 -> 127450;
    127450 -> 127474;
    127450 -> 127442;
    127474 -> 127481;
    127442 -> 127432;
    127481 -> 127487;
    127432 -> 127426;
    127426 -> 127420;
    127487 -> 127498;
    127420 -> 127410;
    127498 -> 127510;
    127510 -> 127516;
    127410 -> 127402;
    127516 -> 127530;
    127530 -> 127542;
    127402 -> 127390;
    127542 -> 127554;
    127390 -> 127383;
    127554 -> 127560;
}

tree diagram of OP's data