Question

有一个我正在逐行阅读的文本文件。它看起来像这样：

3

67

46

67

3

46

每次程序遇到新号码时，都会将其写入文本文件。我想这样做的方法是将第一个数字写入文件，然后查看第二个数字并检查它是否已经在输出文件中。如果不是，则将该号码写入文件。如果是，则跳过该行以避免重复并继续下一行。我该怎么做？

Answer 1

不是搜索输出文件，而是保留一组你编写的数字，而只写下不在集合中的数字。

Answer 2

如果已经编写了数字，而不是检查输出文件中的数字，最好将此信息保存在变量（set或list）中。它将为您节省磁盘读取时间。

要在文件中搜索您需要遍历该文件的每一行的数字，您可以使用for line in open('input'):循环执行此操作，其中input是您文件的名称。在每次迭代时，line将包含一行以行尾字符'\ n'结尾的输入文件。

在每次迭代中，您应该尝试将该行上的值转换为数字，可以使用int()函数。您可能希望使用try语句保护自己免受空行或非数字值的影响。

在每个具有该编号的迭代中，您应该通过检查已编写的set已写入的数字来检查您找到的值是否尚未写入输出文件。如果值尚未在集合中，请添加它并写入输出文件。

#!/usr/bin/env python                                                           
numbers = set() # create a set for storing numbers that were already written       
out = open('output', 'w') # open 'output' file for writing                      
for line in open('input'): # loop through each line of 'input' file             
    try:                                                                        
        i = int(line) # try to convert line to integer                          
    except ValueError:  # if conversion to integer fails display a warning         
        print "Warning: cannot convert to number string '%s'" % line.strip()       
        continue # skip to next line on error                                   
    if i not in numbers: # check if the number wasn't already added to the set  
        out.write('%d\n' % i) # write the number to the 'output' file followed by EOL
        numbers.add(i) # add number to the set to mark it as already added

此示例假定您的input文件在每行包含单个数字。如果行不正确，则会向stdout显示警告。

您也可以在上面的示例中使用list，但效率可能会降低。而不是numbers = set()使用numbers = []而不是numbers.add(i)：numbers.append(i)。 if条件保持不变。

Answer 3

不要那样做。使用set()跟踪您看到的所有号码。它只会有一个。

numbers = set()
for line in open("numberfile"):
    numbers.add(int(line.strip()))
open("outputfile", "w").write("\n".join(str(n) for n in numbers))

注意这会将它们全部读出来，然后立即将它们全部写出来。这将使它们与原始文件中的顺序不同（假设它们是整数，它们将以升序数字顺序出现）。如果您不想这样做，您也可以在阅读时编写它们，但前提是它们不在集合中：

numbers = set()
with open("outfile", "w") as outfile:
    for line in open("numberfile"):
        number = int(line.strip())
        if number not in numbers:
            outfile.write(str(number) + "\n")
            numbers.add(number)

Answer 4

您是否正在处理异常大的文件？您可能不想尝试“搜索”您正在编写的文件以获取您刚刚编写的值。你（可能）想要更像这样的东西：

encountered = set([])

with open('file1') as fhi, open('file2', 'w') as fho:
  for line in fhi:
    if line not in encountered:
      encountered.add(line)
      fho.write(line)

Answer 5

如果您想浏览文件以查看它是否包含任何行上的数字，您可以执行以下操作：

def file_contains(f, n):
    with f:
        for line in f:
            if int(line.strip()) == n:
                return True

        return False

然而，正如奈德在答案中指出的那样，这不是一个非常有效的解决方案;如果您必须再次搜索每行的文件，程序的运行时间将与数字的平方成比例增加。

值的数量不是非常大，使用集合（documentation）会更有效。集合旨在非常有效地跟踪无序值。例如：

with open("input_file.txt", "rt") as in_file:
    with open("output_file.txt", "wt") as out_file:
        encountered_numbers = set()
        for line in in_file:
            n = int(line.strip())

            if n not in encountered_numbers:
                encountered_numbers.add(n)
                out_file.write(line)

PYTHON如何在文本文件中搜索数字

5 个答案: