Question

问候，

我遇到了以下问题：

给定以下结构的文件：

'>some cookies  
chocolatejelly  
peanutbuttermacadamia  
doublecoconutapple  
'>some icecream  
cherryvanillaamaretto  
peanuthaselnuttiramisu  
bananacoffee  
'>some other stuff  
letsseewhatfancythings  
wegotinhere

目标：在包含'＆gt;'的每一行之后输入所有条目作为单个字符串列入列表

代码：

def parseSequenceIntoDictionary(filename):
    lis=[]
    seq=''
    with open(filename, 'r') as fp:
        for line in fp:
            if('>' not in line):
                seq+=line.rstrip()
            elif('>' in line):
                lis.append(seq)
                seq=''
        lis.remove('')
        return lis

所以这个函数遍历文件的每一行如果没有出现'＆gt;'它连接所有后续行并删除然后'，如果是'＆gt;'发生时，它会自动将连接的字符串附加到列表中并“清除”字符串'seq'以连接下一个序列

问题：以一个输入文件为例，它只将“一些饼干”和“一些冰淇淋”中的东西放入列表中 - 但不是来自“其他一些东西”。所以我们得到了结果：

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee] but not  

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee, letsseewhatfancythings 
wegotinhere]

这里有什么错误的想法？迭代中存在一些逻辑错误，我可能没有注意，但我不知道在哪里。

提前感谢任何提示！

Answer 1

问题是，当您点击其中包含seq的行时，您只会存储当前部分'>'。当文件结束时，您仍然打开该部分，但不存储它。

修复程序的最简单方法是：

def parseSequenceIntoDictionary(filename):
    lis=[]
    seq=''
    with open(filename, 'r') as fp:
        for line in fp:
            if('>' not in line):
                seq+=line.rstrip()
            elif('>' in line):
                lis.append(seq)
                seq=''
        # the file ended
        lis.append(seq) # store the last section
        lis.remove('')
        return lis

顺便说一句，您应该使用if line.startswith("'>"):来防止可能的错误。

Answer 2

如果带有＆gt;的新行，您只会将seq附加到结果列表中找到了。所以最后你有一个填充的seq（你缺少的数据），但是你没有把它添加到结果列表中。所以在循环之后只需添加seq，如果其中有一些数据，你应该没问题。

Answer 3

my_list = []
with open('file_in.txt') as f:
    for line in f:
        if line.startswith("'>"):
            my_list.append(line.strip().split("'>")[1])

print my_list  #['some cookies', 'some icecream', 'some other stuff']

Answer 4

好吧，你可以简单地分开'>（如果我告诉你的话）

>>> s="""
... '>some cookies
... chocolatejelly
... peanutbuttermacadamia
... doublecoconutapple
... '>some icecream
... cherryvanillaamaretto
... peanuthaselnuttiramisu
... bananacoffee
... '>some other stuff
... letsseewhatfancythings
... wegotinhere  """
>>> s.split("'>")
['\n', 'some cookies  \nchocolatejelly  \npeanutbuttermacadamia  \ndoublecoconutapple  \n', 'some icecream  \ncherryvanillaamaretto  \npeanuthaselnuttiramisu  \nbananacoffee  \n', 'some other stuff  \nletsseewhatfancythings  \nwegotinhere  ']
>>>

Answer 5

import re

def parseSequenceIntoDictionary(filename,regx = re.compile('^.*>.*$',re.M)):
    with open(filename) as f:
        for el in regx.split(f.read()):
            if el:
                yield el.replace('\n','')

print list(parseSequenceIntoDictionary('aav.txt'))

Python：将文件的特定行放入列表中

5 个答案: