Question

我有一个df1，其中两列代表链接：

point_1 point_2
  'A'     'B'
  'B'     'C'
  'C'     'D'
  'D'     'E'
  'D'     'F'
  'M'     'N'
  'N'     'O'
...

我还有另一个带有point_A和end_point的df2（这是一个需要中断三点的点）和一些point_A可以有多个end_point。

    point_A  end_point
      'A'        'E'
      'A'        'F'
      'M'        'O'
...

我真的不确定如何实现它，我使用了一些函数和下面的方法：首先我将df1转换为list：

temp = []
for row in df1.iterrows():
    index, data = row
    temp.append(data.tolist())

final_list = [[s.encode('ascii') for s in list] for list in temp]

现在功能：

def all_paths(table, root):
    children = {}
    for node, child in table:
        if child: 
            children[node] = children.setdefault(node, []) + [child]

    def recurse(path):
        yield path
        if path[-1] in children:
            for child in children[path[-1]]:
                for foo in recurse(path + [child]):
                    yield foo

    return recurse([root])

path_list = []
for el in d:
    for i in el:
        for path in all_paths(final_list, i):
            path_list.append(path)

我得到了这个：路径列表：

[['A'],
 ['A', 'B'],
 ['A', 'B', 'C'],
 ['A', 'B', 'C', 'D'],
 ['A', 'B', 'C', 'D', 'E'],
 ['A', 'B', 'C', 'D', 'F']
 ['M'],
 ['M', 'N'],
 ['M', 'N', 'O']
...
]

正如你所看到的，我有很多列表项，我不需要它们，我只需要从point_A到end_point的所有项目。所以，我的想法是现在将列表中每个项目的第一个元素与list_A中的point_A和list元素进行比较，并使用end_point。如果它相等则意味着它是正确的。

期望的输出：

[
 ['A', 'B', 'C', 'D', 'E'],
 ['A', 'B', 'C', 'D', 'F']
 ['M', 'N', 'O']
...
]

我在数据帧方面会更好，但这也没关系。

但我确信有更简单，更正确的解决方案。欢迎任何帮助。

Answer 1

point_A  end_point
  'A'        'E'
  'A'        'F'
  'M'        'O'

这个想法是你拿这个表并创建：

points = {'A':set(['E','F']), 'M':set(['O'])}

然后你循环这个：

for start_point, end_points in points.items():
    for path in all_paths(final_list, start_point, end_points):

然后在all_paths函数中：

替换

yield path

使用：

if path[-1] in end_points:
   yield path

此测试会导致任何未在某个给定点结束的路径无法报告。

您可以通过查找链接的连接组件来加快速度，然后确保只有在同一组件中同时包含其起点和终点的对才会枚举其路径。

使用Python创建递归拓扑树

1 个答案: