Question

我正在尝试让我的函数在Insults.txt上查看已排序的文本并确定是否存在重复项，如果存在则返回false，但我似乎无法使其正常工作。我只是想检测重复，而不是删除它们！有人知道我做错了吗？

def checkInsultsFile(numInsults=1000, file="Insults.txt"):
    filename = open(file,'r').readlines()
    for i in range(0, numInsults):
        if [i] == [i+1]:
            return False
        else:
            return True

Answer 1

试试这个，我不知道为什么你有numInsults

def checkInsultsFile(numInsults=1000, file="Insults.txt"):
    lines = open(file, 'r').readlines()

    dict = {}

    for line in lines:
            dict[line] = dict.get(line,0) + 1

    for k,v in dict.iteritems():
            if v > 1:
                    return True
    return False

Answer 2

如果要检查整个文件，如果行数大于1K，我不确定为什么要限制numInsults。

def checkInsultsFile(file):
    with open(file, 'r') as f:
        lines = [line.strip() for line in f] #puts whole file into list if it's not too large for your RAM
    check = set(lines)
    if len(lines) == len(check):
         return False
    elif len(check) < len(lines):
         return True

checkInsultsFile("Insults.txt")

替代方案（逐行浏览文件）：

def checkInsultsFile(file):
    lines = []
    with open(file, 'r') as f:
        for line in f:
             lines.append(line.strip()) 

    check = set(lines)
    if len(lines) == len(check):
         return False
    elif len(check) < len(lines):
         return True

checkInsultsFile("Insults.txt")

此函数将Insults.txt中的所有行放入列表中。 'Check'是一个集合，它只会在'lines'列表中保留唯一的项目。如果行列表等于检查列表，则没有重复项，并返回False。如果检查列表小于行列表，则表示存在重复项，并且将返回True。

或者，您可以使用bash（不知道您的操作系统）。只是要指出有更快/更简单的方法来做到这一点，除非你的python脚本将以其他方式利用文件中唯一的侮辱列表：

排序Insults.txt | uniq -c

这类似于您在Python中使用Counter从集合中执行的操作，它将为您提供文件中所有行的计数。

Answer 3

Mine是一种更为懒惰的方法，因为一旦发现重复，它的执行就会停止。

def checkInsultsFile(filename):
    with open(filename, 'r') as file:
        s = set()
        for line in file:
            if line in s:
                 return True
            s.add(line)
        return False
    except IOError:
        handleExceptionFromFileError()

Answer 4

发生了什么

if [i] == [i+1]:
    return False
else:
    return True

最初，i为0。包含0的单元素列表是否等于包含1的单元素列表？显然不是。因此，执行转到else子句，函数返回True。

它甚至不关心文件的长度或内容，只要它存在且可读。

工作解决方案

从pairwise(iterable)获取(line1, line2)的提示，其产生对(line2, line3)，(line3, line4)，from itertools import tee def any_consecutive_duplicate_lines(file='Insults.txt'): """Return True if the file contains any two consecutive equal lines.""" with open(file) as f: a, b = tee(f) next(b, None) return any(a_line == b_line for a_line, b_line in zip(a, b))等。

另外，使用itertools recipe函数来简化内循环。

df2 = df[df["A"].map(lambda x: x <= 0) | (df["B"] <= 0)]

Answer 5

如果你需要返回，如果有任何欺骗，我们可以采取你的功能并简化一点：

def checkdup(file = "insults.txt")
  lines = open(file, 'r').readlines()
  return len(lines) != len(set(lines))

基本上我们做两件事：取txt中的所有行并将它们作为列表，检查该列表中的项目数

len(lines) #the number of insults in your file.

与该列表中唯一元素集合中的项目数相同

len(set(lines)) # the number of unique elements of our list, or unique insults

如果他们不一样，就必须有傻瓜！

如何逐行检查文本文件以检测是否存在重复项？

5 个答案:

发生了什么

工作解决方案