Question

你可以看到文件A：

LOC_Os06g07630.1 cyto 8, chlo 2, extr 2, nucl 1, cysk 1, chlo_mito 1, cysk_nucl 1
LOC_Os06g12160.1 chlo 7, nucl 3, mito 2.5, cyto_mito 2
LOC_Os06g39870.1 chlo 7, cyto 4, nucl 1, E.R. 1, pero 1
LOC_Os06g48240.1 chlo 9, mito 4
LOC_Os06g48250.1 cyto 5, chlo 4, mito 2, pero 2

我关心“chlo”和“chlo_mito”和“mito”，并且每行的总和值

如行LOC_Os06g07630.1，我将使用chlo 2和chlo_mito 1，和值为3 =（chlo）2+（chlo_mito）1

行和值是

（cyto）8+（chlo）2+（extr）2+（nucl）1+（cysk）1+（chlo_mito）1+（cysk_nucl）1 = 16，然后打印3/16

我想获得下一个内容：

LOC_Os06g07630.1 chlo 2 chlo_mito 1 3/16
LOC_Os06g12160.1 chlo 7 mito 2.5 9.5/14.5
LOC_Os06g39870.1 chlo 7 7/15
LOC_Os06g48240.1 chlo 9 mito 4 13/13
LOC_Os06g48250.1 chlo 4 mito 2 6/13

我的代码是：

import re
dic={}
b=re.compile("chlo|mito|chlo_mito")
with open("~/A","r") as f1:
    for i in f1:
        if i.startswith("#"):continue
        a=i.replace(',',"").replace(" ","/")
        m=b.search(a)
        if m is not None:
            dic[a.strip().split("/")[0]]={}
            temp=a.strip().split("/")[1:]
            c=range(1,len(temp),2)
            for x in c:
                dic[a.strip().split("/")[0]][temp[x-1]]=temp[x]
                #print dic
lis=["chlo","mito","chlo_mito"]
for k in dic:  
    sum_value=0
    sum_values=0     
    for x in dic[k]:                        
        sum_value=sum_value+float(dic[k][x])
        for i in lis: 
        #sum_values=0 
        if i in dic[k]:
           #print i,dic[k][i]
           sum_values=sum_value+float(dic[k][i])
           print k,dic[k],i,sum_values
         #print k,dic[k]

Answer 1

您在描述自己的问题时并不是很清楚。但是我要做的是：编写一个函数，它从文件中输入一行作为输入，并返回一个带有“chlo”，“chlo_mito”，“mito”和“total sum”的字典。这应该会让你的生活变得更加轻松。

Answer 2

像这样的代码可以帮到你：

我假设您的输入文件名为f_input.txt：

from ast import literal_eval as eval

data = (k.rstrip().replace(',', '').split() for k in open("f_input.txt", 'r'))

for k in data:
    chlo = sum(eval(k[j+1]) for j in range(len(k)-1) if k[j] == 'chlo')
    mito = sum(eval(k[j+1]) for j in range(len(k)-1) if k[j] == 'mito')
    chlo_mito = sum(eval(k[j+1]) for j in range(len(k)-1) if k[j] == 'chlo_mito')
    total = sum(eval(k[j]) for j in range(2, len(k), 2))
    if mito == 0 and chlo_mito != 0:
        print("{0} chlo {1} chlo_mito {2} {3}/{4}".format(k[0], chlo, chlo_mito, chlo + chlo_mito, total))
    elif mito != 0 and chlo_mito == 0:
        print("{0} chlo {1} mito {2} {3}/{4}".format(k[0], chlo, mito, chlo + mito, total))
    elif mito !=0 and chlo_mito != 0:
        print("{0} chlo {1} mito {2} chlo_mito {3} {4}/{5}".format(k[0], chlo, mito, chlo_mito, chlo + mito + chlo_mito, total))
    elif mito ==0 and chlo_mito == 0:
        print("{0} chlo {1} {2}/{3}".format(k[0], chlo, chlo , total))

输出：

LOC_Os06g07630.1 chlo 2 chlo_mito 1 3/16
LOC_Os06g12160.1 chlo 7 mito 2.5 9.5/14.5
LOC_Os06g39870.1 chlo 7 7/14
LOC_Os06g48240.1 chlo 9 mito 4 13/13
LOC_Os06g48250.1 chlo 4 mito 2 6/13

Answer 3

我不确定你需要多少速度，但在基因组学中通常是这样。你可能不应该使用太多的字符串操作，如果你可以避免它，并尽可能少的正则表达式。

这是一个不使用regexen的版本，并尝试不花任何时间构建临时对象。我选择使用与您给出的格式不同的输出格式，因为您的第二次难以解析。您可以通过修改.format字符串轻松更改它。

Test_data = """
LOC_Os06g07630.1 cyto 8, chlo 2, extr 2, nucl 1, cysk 1, chlo_mito 1, cysk_nucl 1
LOC_Os06g12160.1 chlo 7, nucl 3, mito 2.5, cyto_mito 2
LOC_Os06g39870.1 chlo 7, cyto 4, nucl 1, E.R. 1, pero 1
LOC_Os06g48240.1 chlo 9, mito 4
LOC_Os06g48250.1 cyto 5, chlo 4, mito 2, pero 2
"""

def open_input():
    """
    Return a file-like object as input stream. In this case,
    it is a StringIO based on your test data. If you have a file
    name, use that instead.
    """

    if False:
        return open('inputfile.txt', 'r')
    else:
        import io
        return io.StringIO(Test_data)

SUM_FIELDS = set("chlo mito chlo_mito".split())

with open_input() as infile:

    for line in infile:

        line = line.strip()
        if not line: continue

        cols = line.split(maxsplit=1)
        if len(cols) != 2: continue

        test_id,remainder = cols
        out_fields = []

        fld_sum = tot_sum = 0.0

        for pair in remainder.split(', '):
            k,v = pair.rsplit(maxsplit=1)
            vf = float(v)
            tot_sum += vf

            if k in SUM_FIELDS:
                fld_sum += vf
                out_fields.append(pair)

        print("{0} {2}/{3} ({4:.0%}) {1}".format(test_id, ', '.join(out_fields), fld_sum, tot_sum, fld_sum/tot_sum))

python中的嵌套字典和值

3 个答案: