我试图从字符串列表中创建一个int(和tuple)列表 让我来解释一下我打算做什么以及让我难以做到的事情。
>>> STRINGS = ['GAT','GAC','ATCG','ATA','GTA']
>>> myFunc(STRINGS)
我在之前的步骤中得到了RESULT和NUMBERS 在此步骤中,应将这些列表转换为高级数据结构。
RESULT = ['G','A','T','C','A','T','C','G','A','T','A']
NUMBERS = [1,2,3,4,5,6,7,8,9,10,11]
[(0,1), (1,2), (2,3), (2,4), ... ] or {(0,1), (1,2), (2,3), (2,4), ... }
{(0,1):'G', (1,2):'A', (2,3):'T', (2,4):'C', ...}
当琴弦长度变化时,计划可能会很困难 将字符与先前字符串的字符进行比较并不容易 将int列表转换为元组,Trie,Graph ...
# SUMMARY
# Sorry, this is not a code.
# This shows how a string list is transformed to int (and tuple) list.
# 'GAT' -> 'G,A,T' -> 1,2,3 -> 1,2,3 -> (0,1),(1,2),(2,3)
# 'GAC' -> '-,-,C' -> -,-,4 -> 1,2,4 -> (0,1),(1,2),(2,4)
# 'ATCG' -> 'A,T,C,G' -> 5,6,7,8 -> 5,6,7,8 -> (0,5),(5,6),(6,7),(7,8)
# 'ATA' -> '-,-,A' -> -,-,9 -> 5,6,9 -> (0,5),(5,6),(6,9)
# 'GTA' -> '-,T,A' -> -,10,11 -> 1,10,11 -> (0,1),(1,10),(9,11)
# ['GAT','GAC','ATCG','ATA','GTA']
# -> ['GAT','C','ATCG','A','TA']
# -> ['G','A','T','C','A','T','C','G','A','T','A']
# -> [1,2,3,4,5,6,7,8,9,10,11]
# -> tuple list
# -> change tuple list to ordered set
# -> apply this to Python graph and Trie structures.
我想将它应用于Python中的Graph和Trie结构。任何暗示或建议都会感激不尽。谢谢。
2015.04.15更新了 我写了一个代码来从字符串列表中获取一个int列表。
def diff_idx(str1, str2):
"""
Returns a maximum common index number + 1
where the characters in both strings are same
>>> diff_idx('GAT','GAC')
2
"""
for i in range(min(len(str1), len(str2))):
if str1[i] == str2[i]:
i += 1
else:
return i
return i
def diff_idxl(xs, x):
"""
>>> diff_idxl(['GAT','GAC','ATCG','ATA'],'GTA')
1
"""
return max([diff_idx(s,x) for s in xs])
def num_seq(patterns):
"""
>>> num_seq(['GAT','GAC','ATCG','ATA','GTA'])
['G', 'A', 'T', 'C', 'A', 'T', 'C', 'G', 'A', 'T', 'A']
"""
lst = patterns[:]
answer = [c for c in lst[0]]
comp = [lst[0]]
for i in range(1, len(patterns)):
answer.extend(patterns[i][diff_idxl(comp,patterns[i]):])
comp.append(patterns[i])
return answer
我可以使用此代码获得正确的结果。
>>> num_seq(['GAT','GAC','ATCG','ATA','GTA'])
['G', 'A', 'T', 'C', 'A', 'T', 'C', 'G', 'A', 'T', 'A']
>>> # (index + 1) means a node in Trie structure.
2015.04.17更新了 我写了一个额外的代码来得到我想要的东西。
>>> # What I want to get is this...
>>> strings = ['GAT','GACA','ATC','GATG']
>>> nseq = num_seq(strings)
['G','A','T','C','A','A','T','C','G']
>>> make_matrix_trie(strings)
[[1, 2, 3], [0, 0, 4, 5], [6, 7, 8], [0, 0, 0, 9]]
我对make_matrix的实现是这样的。
def make_matrix_trie(patterns):
m = []
for pat in patterns:
m.append([0]*len(pat))
comp = num_seq(patterns)
comp.append(0)
idx = 1
for i in range(len(patterns)):
for j in range(len(patterns[i])):
if patterns[i][j] == comp[0]:
m[i][j] = idx
idx += 1
comp.pop(0)
else:
m[i][j] = 0
print (m,comp)
return m
但结果并非我的预期。
>>> make_matrix_trie(['GAT','GACA','ATC','GATG'])
[[1, 2, 3], [0, 0, 4, 5], [6, 7, 8], [9, 0, 0, 0]]
>>> # expected result:
>>> # [[1, 2, 3], [0, 0, 4, 5], [6, 7, 8], [0, 0, 0, 9]]
在一些帮助下,我想我可以纠正并完成我的代码。
答案 0 :(得分:1)
我还没有想出你的掩码和整数赋值方案。这与核苷酸有关吗?一些详细说明会有所帮助。
但是,我可以帮助完成最后一步。这是一个将整数列表转换为“元组列表”的单行程序。def listToTupleList(l):
return [(l[i-1],l[i]) if i!=0 else (0,l[i]) for i in range(len(l))]