修改范围的每个循环中的范围

时间:2015-06-05 13:09:15

标签: python range

我有一个groups.txt文件,其中包含每个组中具有物种和geneID的直系同源组。它看起来像:

OG_117996: R_baltica_p|32476565 V_spinosum_v|497645257
OG_117997: R_baltica_p|32476942 S_pleomorpha_s|374317197
OG_117998: R_baltica_p|32477405 V_bacterium_v|198258541

我创建了一个函数,它创建了一个名为listOfAllSpecies的整个文件中的每个物种的列表(共66个)。我需要创建一个函数,它给出了包含这66个中的1个物种的所有组,然后包含来自这66个中的2个物种的所有组等。

简化它:

OG_1: A|1 A|3 B|1 C|2
OG_2: A|4 B|6
OG_3: C|8 B|9 A|10

我需要进入这个例子:

(species) A,B (are in groups) OG_1, OG_2, OG_3
(species) A,C (are in groups) OG_1, OG_3
(species) B,C (are in groups) OG_1, OG_2, OG_3
(species) A,B,C (are in groups) OG_1, OG_3
(species) B (is in groups) OG_1, OG_2, OG_3

我想尝试

for species in range(start, end=None):         
    if end == None:           
        start = 0
        end = start + 1

获取listOfAllSpecies中的第一个物种,然后告诉我它包含在哪个组OG_XXXX中。然后得到第一和第二物种等,直到它占据所有66种。如何修改for循环中的范围,或者有不同的方法来执行此操作?

这是我的实际代码功能,我需要没有我需要的部分,我问:

import sys 

if len(sys.argv) != 2:
print("Error, file name to open is missing")
sys.exit([1])

def readGroupFile(groupFileName):
dict_gene_taxonomy = {}
fh = open(groupFileName,"r")

for line in fh:
    liste = line.split(": ")
    groupName = liste[0]
    genesAsString = liste[1]
    dict_taxon = {}
    liste_gene = genesAsString.split()

    for item in liste_gene:
        taxonomy_gene = item.split("|")
        taxonomy = taxonomy_gene[0]
        geneId   = taxonomy_gene[1]

        if not taxonomy in dict_taxon:
            dict_taxon[taxonomy] = []

        dict_taxon[taxonomy].append(geneId)

    dict_gene_taxonomy[groupName] = dict_taxon
fh.close()
return dict_gene_taxonomy


def showListOfAllSpecies(dictio):
listAllSpecies = []
for groupName in dictio:
    dictio_in_dictio = dictio[groupName]
    for speciesName in dictio_in_dictio:
        if not speciesName in listAllSpecies:
            listAllSpecies.append(speciesName)
return listAllSpecies

dico = readGroupFile(sys.argv[1])
listAllSpecies = showListOfAllSpecies(dico)

3 个答案:

答案 0 :(得分:3)

不确定这是否正是您想要的,但这是一个开始:)

from itertools import combinations

# Assume input is a list of strings called input_list
input_list = ['OG_1: A|1 A|3 B|1 C|2','OG_2: A|4 B|6','OG_3: C|8 B|9 A|10']

# Create a dict to store relationships and a list to store OGs
rels = {}
species = set()

# Populate the dict
for item in input_list:
    params = item.split(': ')
    og = params[0]
    raw_species = params[1].split()
    s = [rs.split('|')[0] for rs in raw_species]
    rels[og] = s

    for item in s:
        species.add(item)

# Get the possible combinations of species:
combos = [c for limit in range(1, len(l)-1) for c in combinations(species,limit)]

def combo_in_og(combo, og):
    for item in combo:
        if item not in rels[og]:
            return False
    return True

# Loop over the combinations and print
for combo in combos:
    valid_ogs = []
    for og in ogs:
        if combo_in_og(combo, og):
            valid_ogs.append(og)
    print('(species) ' + ','.join(combo) + ' (are in groups) ' + ', '.join(valid_ogs))

产地:

(species) C (are in groups) OG_1, OG_3
(species) A (are in groups) OG_1, OG_2, OG_3
(species) B (are in groups) OG_1, OG_2, OG_3
(species) C,A (are in groups) OG_1, OG_3
(species) C,B (are in groups) OG_1, OG_3
(species) A,B (are in groups) OG_1, OG_2, OG_3
(species) C,A,B (are in groups) OG_1, OG_3

只是一个警告:你要做的事情将开始带有足够大量的输入,因为它的复杂性是2 ^ N.你不能绕过它(那就是the problem demands),但它就在那里。

答案 1 :(得分:-1)

使用while循环控制range()参数怎么样?

end = 0
start = 0
while end < 1000:
    for species in range(start, end):         
        ...do something

    end += 1

答案 2 :(得分:-1)

一组N个项目(所有物种的集合)中所有非空子集的列表是2 N - 1

那是因为它就像N位的二进制数,其中每个位可以是1(在子集中取该物种)或0(从子集中排除该物种).1不包括空集(所有位0)

因此,您可以使用简单的整数循环枚举物种的所有子集:

# sample data
listOfAllSpecies = ['A', 'B', 'C']

# enumerate all subsets of listOfAllSpecies, 0 excluded (the empty set)
for bits in range(1, 2**len(listOfAllSpecies)):

    # build the subset
    subset = []
    for n in range(len(listOfAllSpecies)):
        # test if the current subset includes bit n
        if bits & 2**n:
            subset.append(listOfAllSpecies[n])

    # see which groups contain the given subset
    print "species", ",".join(subset), "are in groups TODO"

结果:

species A are in groups TODO
species B are in groups TODO
species A,B are in groups TODO
species C are in groups TODO
species A,C are in groups TODO
species B,C are in groups TODO
species A,B,C are in groups TODO

如果您还需要代码来测试组是否包含子集,则需要指定组在程序中的存储方式。

如果这篇文章回答了你的问题,你应该点击左上角的绿色复选标记✔。