Question

我在unix服务器上使用python 2.7程序，该服务器读取具有两种信息的ASCII文件并处理该信息。我把这个过程放到一个基本上做的函数中：

def read_info()
    f = open(file_name, 'rb')
    f_enumerator = enumerate(f, start=1)
    for i, line in f_enumerator:
        process_info
    process_last_info

当从我的主程序对文件调用此函数时，它会在一个看似任意点的位置停在输入文件末尾的一行中，而当从同一个输入文件中的一个简单包装器调用该函数时它正确读取整个文件。

我在这里尝试了其中一个解决方案：Python Does Not Read Entire Text File，其中文件以二进制形式读入，但不能解决问题。那里的另一个解决方案（以块为单位读取文件）会有问题，因为我试图在特定行的基础上解析文件，并且阅读一大块文本需要更多的解析。

我愿意这样做，除了问题的间歇性特征告诉我可能还有其他一些解决方案吗？

修改：问题已解决。经过进一步的反思，我意识到这是因为我之前在程序中创建了文件并且没有关闭文件句柄，因此这可能是一个缓冲问题。早先关闭文件修复了问题。

有人建议我使用：

with open(file_name, 'w') as f:
    write foo

打开文件的语法，我认为这确实可以解决这个问题。

Answer 1

进一步思考后，我意识到这是因为我在程序的早期创建了文件并且没有关闭文件句柄，因此这可能是一个缓冲问题。早先关闭文件修复了问题。

有人建议我使用＆＃34;＆＃34;最初写入文件的语法：

with open(file_name, 'w') as f:
    do foo

这确实会阻止我忘记关闭文件，并防止出现此问题。

Answer 2

def read_info():
    with open(file_name, 'rb') as f:
       for i, a_line in enumerate(f,1): #a_line must end with a newline
            process_info(a_line,i)
    # you have processed whole file here so no need for `process_last_info`

使用with将确保您的文件句柄已关闭（在写入文件时尤其应该这样做，但实际上它总是很好的做法）...

来自OP的进一步信息我相信发电机将是解决他的问题的理想方法

def data_gen(f):
   header = None
   lines = []
   for line in f:
       if line.startswith(">"): #header
          if header is not None: #if its not the first line basically
             yield header,lines 
          header = line #set the header
          lines = [] #reinitialize lines
       else:
          lines.append(line)
    yield header,lines # the last section

def read_info(fname):
    with open(fname,"rb") as f:
        for header,lines in data_gen(f):
            process(header,lines)

Answer 3

正如O.P.发现的那样，问题是之前在同一程序中创建了文件，但在读取尝试之前没有正确刷新或关闭。

Python有时不会在整个文件中读取

3 个答案: