我创建了一个读取ID对列表的函数(即[(“A”,“B”),(“B”,“C”),(“C”,“D”),...并且从头到尾对ID进行排序,包括任何分支。
每个有序ID列表都保存在一个名为Alignment的类中,该函数使用递归来处理分支,方法是从分支从主列表中拆分的ID开始创建一个新的对齐。
我发现使用某些输入可以达到Python设置的最大递归限制。我知道我可以使用sys.setrecursionlimit()来增加这个限制,但由于我不知道有多少分支组合是可能的,所以我想避免这种策略。
我一直在阅读几篇关于将递归函数转换为迭代函数的文章,但是我无法确定处理这个特定函数的最佳方法,因为递归发生在函数的中间并且可以是指数函数。 / p>
你们中有人可以提出任何建议吗?
谢谢,Brian
代码发布如下:
def buildAlignments(alignment, alignmentList, endIDs):
while alignment.start in endIDs:
#If endID only has one preceding ID: add preceding ID to alignment
if len(endIDs[alignment.start]) == 1:
alignment.add(endIDs[alignment.start][0])
else:
#List to hold all branches that end at spanEnd
branches = []
for each in endIDs[alignment.start]:
#New alignment for each branch
al = Alignment(each)
#Recursively process each new alignment
buildAlignments(al, branches, endIDs)
branches.append(al)
count = len(branches)
i = 0
index = 0
#Loop through branches by length
for branch in branches:
if i < count - 1:
#Create copy of original alignment and add branch to alignment
al = Alignment(alignment)
al += branch #branches[index]
alignmentList.append(al)
i += 1
#Add single branch to existing original alignment
else: alignment += branch #branches[index]
index += 1
def main():
IDs = [("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")]
#Gather all startIDs with corresponding endIDs and vice versa
startIDs = {}
endIDs = {}
for pair in IDs:
if not pair[0] in startIDs: startIDs[pair[0]] = []
startIDs[pair[0]].append(pair[1])
if not pair[1] in endIDs: endIDs[pair[1]] = []
endIDs[pair[1]].append(pair[0])
#Create Alignment objects from any endID that does not start another pair (i.e. final ID in sequence)
alignments = [Alignment(end) for end in endIDs if not end in startIDs]
#Build build sequences in each original Alignment
i = len(alignments)
while i:
buildAlignments(alignments[i-1], alignments, endIDs)
i -= 1
编辑:我应该指出,提供的ID只是我用于测试此算法的一小部分样本。实际上,ID的序列可能是几千个长,其中有许多分支和分支。
决议:感谢Andrew Cooke。在调用堆栈上,新方法似乎更简单,更容易。我确实对他的代码做了一些小的调整,以更好地适应我的目的。我已经在下面列出了完整的解决方案:
from collections import defaultdict
def expand(line, have_successors, known):
#print line
known.append(line)
for child in have_successors[line[-1]]:
newline = line + [child]
if line in known: known.remove(line)
yield expand(newline, have_successors, known)
def trampoline(generator):
stack = [generator]
while stack:
try:
generator = stack.pop()
child = next(generator)
stack.append(generator)
stack.append(child)
except StopIteration:
pass
def main(pairs):
have_successors = defaultdict(lambda: set())
links = set()
for (start, end) in pairs:
links.add(end)
have_successors[start].add(end)
known = []
for node in set(have_successors.keys()):
if node not in links:
trampoline(expand([node], have_successors, known))
for line in known:
print line
if __name__ == '__main__':
main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])
变更摘要:
交换链接和have_successors从头到尾创建列表
添加if line in known: known.remove(line)
进行展开以仅保留完整系列
将行变量从字符串更改为列表,以便处理单个ID中的多个字符。
更新:所以我刚刚发现我遇到问题的原因首先是我提供的ID列表中的循环引用。现在循环引用已修复,任一方法都可以按预期工作。 - 再次感谢您的帮助。
答案 0 :(得分:14)