比较每个指数的两个不等长度的列表

时间:2016-10-16 15:57:44

标签: list python-2.7 itertools

我有两个不等长的列表,例如

            list1 = ['G','T','C','A','G']
            list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']
  • 我想在每个索引上仅针对相应的位置比较这两个列表,即list2 [0]针对list1 [0]和list2 [1]针对list1 [1],依此类推,直到list1的长度。

  • 获得两个新列表,其中一个具有不匹配,第二个列表具有不匹配的位置,例如在编码语言中,它可以表示为:

          if 'G' == 'GGG' or 'G' # where 'G' is from list1[1] and 'GGG' is from list2[2] 
          elif 'G' == 'AAA'     
          {
          outlist1 == list1[index] # postion of mismatch 
          outlist2 == 'G/A'
          } 
    

2 个答案:

答案 0 :(得分:1)

好的,这有效。肯定有一些方法可以用更少的代码来实现,但我认为这很清楚:

#Function to process the lists
def get_mismatches(list1,list2):
    #Prepare the output lists
    mismatch_list = []
    mismatch_pos = []

    #Figure out which list is smaller
    smaller_list_len = min(len(list1),len(list2))

    #Loop through the lists checking element by element
    for ind in range(smaller_list_len):
        elem1 = list1[ind][0] #First char of string 1, such as 'G'
        elem2 = list2[ind][0] #First char of string 2, such as 'A'

        #If they match just continue
        if elem1 == elem2:
            continue
        #If they don't match update the output lists
        else:
            mismatch_pos.append(ind)
            mismatch_list.append(elem1+'/'+elem2)

    #Return the output lists
    return mismatch_list,mismatch_pos


#Make input lists
list1 = ['G','T','C','A','G']
list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']

#Call the function to get the output lists
outlist1,outlist2 = get_mismatches(list1,list2)

#Print the output lists:
print outlist1
print outlist2

输出:

['G/A', 'C/G', 'A/C']
[0, 2, 3]

只是为了看看我能得到代码的时间有多短,我认为这个函数是等价的:

def short_get_mismatches(l1,l2):
    o1,o2 = zip(*[(i,x[0]+'/'+y[0]) for i,(x,y) in enumerate(zip(l1,l2)) if x[0] != y[0]])
    return list(o1),list(o2)

#Make input lists
list1 = ['G','T','C','A','G']
list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']

#Call the function to get the output lists
outlist1,outlist2 = short_get_mismatches(list1,list2)

编辑:

我不确定我是否按照你想要的N和-s来清理序列。这是您评论中示例的答案吗?

Unclean list1 ['A', 'T', 'G', 'C', 'A', 'C', 'G', 'T', 'C', 'G']
Clean list1 ['A', 'T', 'G', 'C', 'A', 'C', 'G', 'T', 'C', 'G']

Unclean list2 ['GGG', 'TTTN', '-', 'NNN', 'AAA', 'CCC', 'GCCC', 'TTT', 'CCCTN']
Clean list2 ['GGG', 'TTT', 'AAA', 'CCC', 'GCCC', 'TTT', 'CCCT']

0 A GGG
1 T TTT
2 G AAA
3 C CCC
4 A GCCC
5 C TTT
6 G CCCT
['A/G', 'G/A', 'A/G', 'C/T', 'G/C']
[0, 2, 4, 5, 6]

答案 1 :(得分:1)

这适用于我的问题:

    #!/usr/bin/env python
    list1=['A', 'T', 'G', 'C', 'A' ,'C', 'G' , 'T' , 'C', 'G']
    list2=[ 'GGG' , 'TTTN' , ' - ' , 'NNN' , 'AAA' , 'CCC' , 'GCCC' , 'TTT'  ,'CCCATN' ]

    notifications = []
    indexes = []

    for i in range(min(len(list1), len(list2))):
        item1 = list1[i]
        item2 = list2[i]

    # Skip ' - '
    if item2 == ' - ':
        continue

    # Remove N since it's a wildcard
    item2 = item2.replace('N', '')

    # Remove item1
    item2 = item2.replace(item1, '')

    chars = set(item2)

    # All matched
    if len(chars) == 0:
       continue

    notifications.append('{}/{}'.format(item1, '/'.join(set(item2))))
    indexes.append(i)

    print(notifications)
    print(indexes)

它将输出显示为

  • ['A / G','G / C','C / A / T']
  • [0,6,8]