Question

我正在尝试将列表中的依赖项添加到requirements.txt文件中，具体取决于软件将运行的平台。所以我写了下面的代码：

if platform.system() == 'Windows':
    # Add windows only requirements
    platform_specific_req = [req1, req2]
elif platform.system() == 'Linux':
    # Add linux only requirements
    platform_specific_req = [req3]

with open('requirements.txt', 'a+') as file_handler:
    for requirement in platform_specific_req:
        already_in_file = False
        # Make sure the requirement is not already in the file
        for line in file_handler.readlines():
            line = line.rstrip()  # remove '\n' at end of line
            if line == requirement:
                already_in_file = True
                break
        if not already_in_file:
            file_handler.write('{0}\n'.format(requirement))
    file_handler.close()

但是这段代码发生的事情是，当要在文件中已有的需求列表中搜索第二个需求时，for line in file_handler.readlines():似乎指向列表中的最后一个元素。文件，所以新的要求实际上只与列表中的最后一个元素进行比较，如果它不是相同的，则添加它。显然，这导致在列表中复制了几个元素，因为只有第一个要求与列表中的所有元素进行比较。如何告诉python再次从文件顶部开始比较？

解决方案： 我收到了很多很棒的回复，我学到了很多，感谢伙计们。我最终结合了两个解决方案;一个来自Antti Haapala，另一个来自Matthew Franglen。我在这里展示最终代码以供参考：

# Append the extra requirements to the requirements.txt file
with open('requirements.txt', 'r') as file_in:
    reqs_in_file = set([line.rstrip() for line in file_in])
    missing_reqs = set(platform_specific_reqs).difference(reqs_in_file)

with open('requirements.txt', 'a') as file_out:
    for req in missing_reqs:
        file_out.write('{0}\n'.format(req))

Answer 1

在迭代现有需求列表之前打开文件句柄。然后，您可以阅读每个需求的整个文件句柄。

文件句柄将在第一个要求之后完成，因为您尚未重新打开它。为每次迭代重新打开文件将非常浪费 - 将文件读入列表然后在循环内使用它。或做一组比较！

file_content = set([line.rstrip() for line in file_handler])
only_in_platform = set(platform_specific_req).difference(file_content)

Answer 2

请勿尝试再次为每个要求读取文件。虽然追加确实适用于这个用例，但对于一般的修改，更容易：

将文件中的内容读入列表（最好是空行）
修改列表
再次打开文件进行写入并保存修改后的数据。

所以例如

with open('requirements.txt', 'r') as fin:
    requirements = [ i for i in (line.strip() for line in fin) if i ]

for req in platform_specific_req:
    if req not in requirements:
        requirements.append(req)

with open('requirements.txt', 'w') as fout:
    for req in requirements:
        fout.write('{0}\n'.format(req))
        # or print(req, file=fout)

Answer 3

您明确问题的答案：file_handler.seek（0）会将其搜索回文件的开头。

一些巧妙的改进：

您可以将文件处理程序本身用作迭代器，而不是调用readlines（）方法。

如果您的文件太大而无法完全读入内存，那么直接迭代文件中的行就可以了 - 但您应该改变您的操作方式。按原样，您将针对每个需求迭代整个文件，但IO成本很高。您应该迭代这些行，并且对于每一行检查它是否是其中一个要求。像这样：

with open('requirements.txt', 'a+') as file_handler:
   for line in file_handler:
      line = line.rstrip()
      if line in platform_specific_req:
         platform_specific_req.remove(line)
   for req in platform_specific_req:
      file_handler.write('{0}\n'.format(req))

Answer 4

我知道我回答得有点迟了，但我建议这样做，打开一次，阅读和追加。请注意，无论您的系统如何，这都应该适用于每个平台：

import os

def ensure_in_file(lines, file_path):
    '''
    idempotent function to append lines to a file if they're not already there
    '''
    with open(file_path, 'r+U') as f: # r+U allows append, Universal Newline mode
        # set of all lines in the file, less newlines, and trailing spaces too.
        file_lines = set(l.rstrip() for l in f)
        # write lines not in the file, add the os line separator as you go
        f.writelines(l + os.linesep for l in set(lines).difference(file_lines))

你可以测试一下

a_file = '/temp/temp/foo/bar' # insert your own file path here.

# with open(a_file, 'w') as f:  # ensure a blank file
    # pass 
ensure_in_file(['this', 'that'], a_file)
with open(a_file, 'rU') as f: 
    print f.read()

ensure_in_file(['this', 'that'], a_file)
with open(a_file, 'rU') as f:
    print f.read()

每个print语句都应该证明文件每行一次。

如何从python顶部开始读取文件？

4 个答案: