对于子父关系表(csv),我试图使用表中的所有数据收集可能的父子关系组合链。我正在尝试解决一个问题,即如果存在多个子父项(参见第3行和第4行),则第二个子父组合(第4行)不包含在迭代中。
数据示例:
子,父的
A,B
A,C
B,D
B,C
C,D
预期的连锁结果:
D|B|A
D|C|B|A
D|C|A
实际连锁结果:
D|B|A
D|C|A
代码
find= 'A' #The child for which the code should find all possible parent relationships
sequence = ''
with open('testing.csv','r') as f: #testing.csv = child,parent table (above example)
for row in f:
if row.strip().startswith(find):
parent = row.strip().split(',')[1]
sequence = parent + '|' + find
f1 = open('testing.csv','r')
for row in f1:
if row.strip().startswith(parent):
parent2 = row.strip().split(',')[1]
sequence = parent2 + '|' + sequence
parent = parent2
else:
continue
print sequence
答案 0 :(得分:3)
你看过this精彩的文章了吗?真正理解python中的模式是必不可少的阅读。你的问题可以被认为是一个图形问题 - 找到关系基本上是找到从子节点到父节点的所有路径。
由于可能存在任意数量的嵌套(child-> parent1-> parent2 ...),因此您需要一个递归解决方案来查找所有路径。在您的代码中,您有2个for
循环 - 在您发现时最多只会生成3个级别的路径。
以下代码改编自上面的链接以解决您的问题。函数find_all_paths
需要图形作为输入。
让我们从您的文件中创建图表:
graph = {} # Graph is a dictionary to hold our child-parent relationships.
with open('testing.csv','r') as f:
for row in f:
child, parent = row.split(',')
graph.setdefault(parent, []).append(child)
print graph
使用您的样本,应该打印:
{'C': ['A', 'B'], 'B': ['A'], 'D': ['B', 'C']}
以下代码直接来自文章:
def find_all_paths(graph, start, end, path=[]):
path = path + [start]
if start == end:
return [path]
if not graph.has_key(start):
return []
paths = []
for node in graph[start]:
if node not in path:
newpaths = find_all_paths(graph, node, end, path)
for newpath in newpaths:
paths.append(newpath)
return paths
for path in find_all_paths(graph, 'D', 'A'):
print '|'.join(path)
D|B|A
D|C|A
D|C|B|A
答案 1 :(得分:0)
我不确定这是否是最有效的方法(但是每行再次读取文件会更糟)。
find= 'A' #The child for which the code should find all possible parent relationships
sequences = set(find)
# we'll build up a chain for every relationship, then strip out un-needed ones later
with open('testing.csv','r') as f: #testing.csv = child,parent table (above example)
for row in f:
child, parent = row.strip().split(',')
sequences.add(parent + '|' + child)
for c in sequences.copy():
if c[0] == child:
sequences.add(parent + '|' + c)
# remove any that don't end with our child:
sequences = set(s for s in sequences if s.endswith(find))
# get all shorter chains when we have a longer one
extra = set()
for g1 in sequences:
for g2 in sequences:
if g2[2:] == g1:
extra.add(g1)
# remove the shorter chains
sequences.difference_update(extra)
for chain in sequences:
print(chain)
结果:
D|C|A
D|C|B|A
D|B|A