Question

这是函数

def duplicate(fname):
    'returns true if there are duplicates in the file, false otherwise'
    fn = open(fname, 'r')
    llst = fn.readlines()
    fn.close()

我不知道去哪里。我尝试拆分文件，对其进行排序，然后编写一个函数来查找两个相同的单词是否按连续顺序排列。但它说我不能将分裂归因于列表。

有什么想法吗？

Answer 1

您可以将每个单词作为键添加到字典中。如果密钥已经存在，则它是重复的。您还可以将找到单词的次数与值相关联。

#!/usr/bin/env python
def duplicate(fname):
    'returns true if there are duplicates in the file, false otherwise'
    with open (fname, 'r') as file_handle:
        word_dict = dict()
        for line in file_handle:
            words = line.split()
            for word in words:
                if word in word_dict:
                    word_dict[word] = 'Duplicate'
                else:
                    word_dict[word] = 'Unique'
    return word_dict

results = duplicate('alice.txt')
for key in results:
    print "{}: {}".format(key, results[key])

Answer 2

您可以使用set数据结构：

def has_duplicate_words(filename):
    with open(filename, 'r') as f:
        words = set()
        for line in f.readlines():
            lineWords = line.split()
            for word in lineWords:
                if word in words:
                    return True

                words.add(word)
    return False

另请注意，这取决于您对单词的定义。在此解决方案中，它是任何不包含空白字符的字符序列，即split()函数documentation中定义的空格，制表符，换行符，返回值，换页符。

如果您想要返回所有重复项，则可以在list中累积它们，而不是在找到重复项时执行return True。

另请注意，如果文件可能包含不适合内存的极长行，则此解决方案不可行。

Answer 3

你在寻找这个吗？

def duplicate(fname):
    with open(fname, "r") as f: # it's better to use with open, than only open, since otherwise the file might not be closed on error
        dict = {} # create an empty dictionary for checking, if a line was already in the file
        for line in f: # go through all lines
            try:
                foo = dict[line] # check, if line already exists
                return True # no error was thrown, so this is a duplicated line
            except:
                dict[line] = 1 # give the key line some random input, so that the dict contains this key
    return False

另一种方法是读取此文件，对行进行排序，然后检查douplicate行，然后相互跟随。

请注意，如果文件包含“foo”和“foo”行，则由于第二行末尾的空格，因此不会返回true，而是false。

Answer 4

一种更简单的方法：将文件中单词列表的长度与一组单词的长度进行比较：

>>> def HasDuplicates(str):
...    words = str.split()
...    uniqueWords = set(words)
...    return len(words) != len(uniqueWords)
...
>>> str1 = "this is a sentence with two two duplicates"
>>> str2 = "this is a sentence with no duplicates"
>>> HasDuplicates(str1)
True
>>> HasDuplicates(str2)
False

（文件I / O作为读者的练习而留下;它与重复的问题没有密切关系）

Answer 5

这有效：

如果有重复项，它会返回True，但也会构建一个字典，其中重复的单词为key，并且它们在文本中的频率为value并打印出来。我知道的比你要求的要多，但是改变代码只需检查重复项并返回True / False就不会花费太多。

def duplicate(fname):

    with open(fname, 'r') as f:
        text = f.read() # auto closes file after reading

    split_text = [word.strip() for word in text.split()] # create list of all the words

    duplicates = {}
    for word in split_text:
        count = text.count(word) # count occurrences of each word
        if count > 1:
            duplicates[word] = count
    if duplicates:
        print duplicates
        return True
    return False

示例输出：

{'dear': 2, 'the': 6, 'name': 2}

Answer 6

with open('filepath','r') as f:
    all_words = f.read().split()
    return len(all_words) > len(set(all_words))

如果文件中有重复的单词，如何编写一个返回true的函数

6 个答案: