Question

我在for循环中有一个列表，它使用itertools.product()来查找不同的字母组合。我想用collections.Counter()计算一个项目的出现次数，但是，现在它打印出“A”和“G”的所有不同组合：

['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'g']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'a', 'G']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
# etc.

现在，这不是全部，但正如您所看到的，虽然排序方式不同，但有些情况相同，例如：

['a', 'G', 'A', 'G']
['a', 'A', 'G', 'G']

我更喜欢后者的排序，所以我想找到一种方法，在小写之前用大写字母打印所有组合，并且因为'a'在'g'之前，也按字母顺序排列。最终产品应该看起来像['AaGG', 'aaGg', etc]。我应该使用哪些功能？

这是生成数据的代码。标有“计数”的部分是我遇到的麻烦。

import itertools
from collections import Counter
parent1 = 'aaGG'
parent2 = 'AaGg'
f1 = []
f1_ = []
genotypes = []
b = []
genetics = []
g = []
idx = []

parent1 = list(itertools.combinations(parent1, 2))    
del parent1[0]
del parent1[4] 

parent2 = list(itertools.combinations(parent2, 2))    
del parent2[0]
del parent2[4]


for x in parent1:
    f1.append(''.join(x))

for x in parent2:
    f1_.append(''.join(x))

y = list(itertools.product(f1, f1_))  

for x in y:
    genotypes.append(''.join(x))
    break
genotypes = [
        thingies[0][0] + thingies[1][0] + thingies[0][1] + thingies[1][1]
        for thingies in zip(parent1, parent2)
] * 4
print 'F1', Counter(genotypes)

# Counting
for genotype in genotypes:
    alleles = list(itertools.combinations(genotype,2))
    del alleles[1]
    del alleles[3]
    for x in alleles:
        g.append(''.join(x))

for idx in g:
    if idx.lower().count("a") == idx.lower().count("g") == 1:
        break                

f2 = list(itertools.product(g, g)) 

for x in f2:
    genetics.append(''.join(x)) 

for genes in genetics:
    if genes.lower().count("a") == genes.lower().count("g") == 2:
        genes = ''.join(genes)
    print Counter(genes)

Answer 1

我认为您正在寻找定制优先级的自定义方式;列表当前按ASCII编号排序，ASCII编号将大写字母定义为始终小写字母。我将使用字典定义自定义优先级：

>>> test_list = ['a', 'A', 'g', 'G']
>>> precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}
>>> test_list.sort(key=lambda x: precedence_dict[x])
>>> test_list
['A', 'a', 'G', 'g']

编辑：你的最后几行：

for genes in genetics:
    if genes.lower().count("a") == genes.lower().count("g") == 2:
        genes = ''.join(genes)
    print Counter(genes)

没有做你想要的。

将这些行替换为：

precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}

for i in xrange(len(genetics)):
    genetics[i] = list(genetics[i])
    genetics[i].sort(key=lambda x: precedence_dict[x])
    genetics[i] = ''.join(genetics[i])
from sets import Set

genetics = list(Set(genetics))
genetics.sort()

print genetics

我认为你有正确的解决方案。在for循环中迭代元素时，Python会复制该项。所以字符串＆＃39;基因＆＃39;实际上没有在原始列表中修改。

Answer 2

我知道您没有要求进行代码审核，但您可能最好只是按照您想要的顺序生成字符串，而不是之后尝试过滤它们。这样的事情可能有用。

def cross(parent1, parent2):

    out = []
    alleles = len(parent1)/2

    # iterate parent 1 possible genotypes
    for i in range(2):

        # iterate loci 
        for k in range(alleles):
            child = []

            # iterate parent 2 possible genotypes
            for j in range(2):
                p1 = parent1[j * 2 + i]
                p2 = parent2[j * 2 + k]
                c = [p1, p2]

                # get each genotype pair into capitalization order
                c.sort()
                c.reverse()
                child += c

            out.append("".join(child))
    return out


if __name__ == "__main__":

    parent1 = 'aaGG'
    parent2 = 'AaGg'

    # F1
    f1 = cross(parent1, parent2)
    print f1

    # F2
    f2 = []
    for p1 in f1:
        for p2 in f1:
            f2 += cross(p1, p2)
    print f2

这是从单亲获得所有组合的一种方法。从空字符串开始，逐个添加可能性。

def get_all_combos(allele_pair, gametes):
# Take a list of of genotypes. Return an updated list with each possibility from an allele pair

    updated_gametes = []
    for z in gametes:
       updated_gametes.append(z + allele_pair[0])
       updated_gametes.append(z + allele_pair[1])
    return updated_gametes

if __name__ == "__main__":

    parent1 = 'aaGG'
    parent2 = 'AaGg'

    alleles = len(parent2)/2
    gametes = [""]
    for a in range(alleles):
        allele_pair = parent2[a*2:a*2+2]
        gametes = get_all_combos(allele_pair, gametes)
    print gametes

也许你可以弄清楚如何结合这两种解决方案来获得你想要的东西。

Answer 3

您可以尝试使用sort函数。我的意思是：

parent1 = "absdksakjcvjvugoh"
parent1sorted = list(parent1)
parent1sorted.sort()
print (parent1sorted)

你得到的结果是：['a'，'a'，'b'，'c'，'d'，'g'，'h'，'j'，'j'，'k' ，'k'，'o'，'s'，'s'，'u'，'v'，'v']

这对你有帮助吗？

tldr：将字符串转换为列表，排序列表

按字母顺序和大小写重新排列列表中的字符串

3 个答案: