Python - 从未排序的层次结构文件中查找后代和祖先

时间:2017-06-12 16:24:55

标签: python python-3.x

我有一个未排序的父子层次结构文件(制表符分隔),格式如下:

City1   Area1
City1   Area2
Continent1  Country1
Continent2  Country2
Continent3  Country3
Continent4  Country4
Continents  Continent1
Continents  Continent2
Continents  Continent3
Continents  Continent4
Country1    State1
Country2    State2
Country3    State3
Earth   Continents
State1  City1
State1  City1.1
State2  City2

我的目标是找到所有成员的“后代”和“祖先”。

以下是我编写的内容:

import sys, re

with open("input.txt", "r") as my_in:
    collections={}
    for line in my_in:
        parent, child=line.rstrip('\r\n').split('\t')
        collections.setdefault(parent, []).append(child)

print (collections)
'''
{'Continent4': ['Country4'], 'Continent2': ['Country2'], 
'Continents': ['Continent1', 'Continent2', 'Continent3', 'Continent4'], 
'Continent1': ['Country1'], 'Country2': ['State2'], 
'Country3': ['State3'], 'State1': ['City1', 'City1.1'], 
'Country1': ['State1'], 'State2': ['City2'], 
'Earth': ['Continents'], 'City1': ['Area1', 'Area2'], 'Continent3': ['Country3']}
'''

def find_descendants(parent, collections):
descendants = []
for descendant in collections[parent]:
    if descendant in collections:
        descendants = descendants + find_descendants(descendant, collections)
    else:
        descendants.append(descendant)
return descendants

# Get descendants of "Continent1":
lis=find_descendants("Continent1", collections)
print (lis) # It shows ['Area1', 'Area2', 'City1.1']
# Actually it should show ['Country1', 'State1', 'City1', 'Area1', 'Area2',   'City1.1']

def find_ancestors(child, collections):
    # pseudo code
    # link child to its parent and parent to its parent until no more parents are found
    pass

# lis=find_ancestors("City1.1", collections)
# should show ['Earth', 'Continents', 'Continent1', 'Country1', 'State1']

函数find_descendants未按预期工作。就find_ancestors函数而言,虽然我知道伪代码,但我无法用Python表达它。

请帮忙。

2 个答案:

答案 0 :(得分:1)

正如我在评论中所说的那样,在你看之前你忘了追你的后代  更深入的收藏。这有效:

def find_descendants(parent, collections):
    descendants = []
    for descendant in collections[parent]:
        descendants.append(descendant)
        if descendant in collections:
            descendants = descendants + find_descendants(descendant, collections)
    return descendants

对于祖先,只需构建另一个collections,比如ancestors_collection,它存储反向关系后代/祖先。然后,查找祖先的函数应与find_descendants完全相同,您可以相应地重命名。

编辑:

  

这是一个完整的工作代码,我使用relative来引用祖先或后代:

import sys, re

with open("input.txt", "r") as my_in:
    descendants={}
    ancestors={}
    for line in my_in:
        parent, child=line.rstrip('\r\n').split('\t')
        descendants.setdefault(parent, []).append(child)
        ancestors.setdefault(child, []).append(parent)

def get_relatives(element, collection):
    relatives = []
    for relative in collection[element]:
        relatives.append(relative)
        if relative in collection:
            relatives = relatives + get_relatives(relative, collection)
    return relatives

# Get descendants of "Continent1":
lis=get_relatives("Continent1", descendants)
print (lis)
# shows ['Country1', 'State1', 'City1', 'Area1', 'Area2',   'City1.1']

lis=get_relatives("City1.1", ancestors)
print (lis)
# shows ['Earth', 'Continents', 'Continent1', 'Country1', 'State1']

答案 1 :(得分:0)

Here's a simpler solution that uses networkx:

import networkx as nx

coll = nx.DiGraph()
with open("input.txt") as f:
    for line in map(str.strip, f):
        ancestor, descendant = line.split("\t")
        coll.add_edge(ancestor, descendant)

print(nx.descendants(coll, "Continent1"))
# {'Area2', 'City1.1', 'Area1', 'City1', 'State1', 'Country1'}

print(nx.ancestors(coll, "City1.1"))
# {'Earth', 'Continent1', 'State1', 'Continents', 'Country1'}

Both functions return a set so the ancestors and descendants are not ordered.