按索引顺序组合特定ID的字符串。蟒蛇

时间:2013-06-10 19:30:16

标签: python string function indexing while-loop

所以我有一份子列表

子列表的第一个值是ID,第二个值是索引。

最终,我正在尝试按照索引的顺序为每个ID汇总字符串。

raw_IDs = ['TCONS_0040771;1','TCONS_0040771;2','TCONS_0040771;3','TCONS_00040772;1','TCONS_00040772;2','TCONS_00040773;1','TCONS_00040773;2','TCONS_00040773;3','TCONS_00040773;4']

IDs = [['TCONS_0040771',1],['TCONS_0040771',2],['TCONS_0040771',3],['TCONS_00040772',1],['TCONS_00040772',2],['TCONS_00040773',1],['TCONS_00040773',2],['TCONS_00040773',3],['TCONS_00040773',4]]

我有一个每个值的序列字典,所以......

sequences = []

for k in raw_IDs:
    sequences.append(D_ID_seq[k])
print sequences

sequences = ['AAA','AAB','AAAB','AAAA','BAA','BBA','BBB','CCC','DDD']  

我正在尝试根据ID汇总序列,TCONS_xxx值

desired_output = ['AAAAABAAAB','AAAABAA','BBABBBCCCDDD']

示例:ID中的前3个元素都具有相同的ID“TCONS_0040771”。但是,他们有不同的指数,范围从1-3。索引1-2的“TCONS_0040772”和索引1-4的“TCONS_0040773”重复此过程。

所需的输出是从附加到列表中的字典值中收集的所有字符串的组合,基于相应的ID

请注意*** 我正在考虑创建一个while循环,但是当我有时尝试它们时,它们会变得非常混乱并最终运行无限时间。

任何帮助将不胜感激

2 个答案:

答案 0 :(得分:0)

# This assumes raw_IDs[] and sequences[] have been defined
IDs = [id.split(';') for id in raw_IDs]

prev_id = None
desired_output = []
for id in IDs.keys()
    if id != prev_id:
        if prev_id:
            desired_output.append(output)
        output = ''
    output += sequences.pop(0)
if output:
    desired_output.append(output)

答案 1 :(得分:0)

使用此数据:

IDs = [['TCONS_0040771', 1], ['TCONS_0040771', 2], ['TCONS_0040771', 3],
       ['TCONS_00040772', 1], ['TCONS_00040772', 2], ['TCONS_00040773', 1],
       ['TCONS_00040773', 2], ['TCONS_00040773', 3], ['TCONS_00040773', 4]]
sequences = ['AAA','AAB','AAAB','AAAA','BAA','BBA','BBB','CCC','DDD']

此代码:

last_id = IDs[0][0]
res = [sequences[0]]
for index, (id_, _) in enumerate(IDs[1:], 1):
    if id_ == last_id:
        res[-1] += sequences[index]
    else:
        res.append(sequences[index])
    last_id = id_

res

提供此信息
['AAAAABAAAB', 'AAAABAA', 'BBABBBCCCDDD']