这是函数
def duplicate(fname):
'returns true if there are duplicates in the file, false otherwise'
fn = open(fname, 'r')
llst = fn.readlines()
fn.close()
我不知道去哪里。我尝试拆分文件,对其进行排序,然后编写一个函数来查找两个相同的单词是否按连续顺序排列。但它说我不能将分裂归因于列表。
有什么想法吗?
答案 0 :(得分:1)
您可以将每个单词作为键添加到字典中。如果密钥已经存在,则它是重复的。您还可以将找到单词的次数与值相关联。
#!/usr/bin/env python
def duplicate(fname):
'returns true if there are duplicates in the file, false otherwise'
with open (fname, 'r') as file_handle:
word_dict = dict()
for line in file_handle:
words = line.split()
for word in words:
if word in word_dict:
word_dict[word] = 'Duplicate'
else:
word_dict[word] = 'Unique'
return word_dict
results = duplicate('alice.txt')
for key in results:
print "{}: {}".format(key, results[key])
答案 1 :(得分:0)
您可以使用set
数据结构:
def has_duplicate_words(filename):
with open(filename, 'r') as f:
words = set()
for line in f.readlines():
lineWords = line.split()
for word in lineWords:
if word in words:
return True
words.add(word)
return False
另请注意,这取决于您对单词的定义。在此解决方案中,它是任何不包含空白字符的字符序列,即split()
函数documentation中定义的空格,制表符,换行符,返回值,换页符。
如果您想要返回所有重复项,则可以在list
中累积它们,而不是在找到重复项时执行return True
。
另请注意,如果文件可能包含不适合内存的极长行,则此解决方案不可行。
答案 2 :(得分:0)
你在寻找这个吗?
def duplicate(fname):
with open(fname, "r") as f: # it's better to use with open, than only open, since otherwise the file might not be closed on error
dict = {} # create an empty dictionary for checking, if a line was already in the file
for line in f: # go through all lines
try:
foo = dict[line] # check, if line already exists
return True # no error was thrown, so this is a duplicated line
except:
dict[line] = 1 # give the key line some random input, so that the dict contains this key
return False
另一种方法是读取此文件,对行进行排序,然后检查douplicate行,然后相互跟随。
请注意,如果文件包含“foo”和“foo”行,则由于第二行末尾的空格,因此不会返回true,而是false。
答案 3 :(得分:0)
一种更简单的方法:将文件中单词列表的长度与一组单词的长度进行比较:
>>> def HasDuplicates(str):
... words = str.split()
... uniqueWords = set(words)
... return len(words) != len(uniqueWords)
...
>>> str1 = "this is a sentence with two two duplicates"
>>> str2 = "this is a sentence with no duplicates"
>>> HasDuplicates(str1)
True
>>> HasDuplicates(str2)
False
(文件I / O作为读者的练习而留下;它与重复的问题没有密切关系)
答案 4 :(得分:0)
这有效:
如果有重复项,它会返回True
,但也会构建一个字典,其中重复的单词为key
,并且它们在文本中的频率为value
并打印出来。我知道的比你要求的要多,但是改变代码只需检查重复项并返回True / False就不会花费太多。
def duplicate(fname):
with open(fname, 'r') as f:
text = f.read() # auto closes file after reading
split_text = [word.strip() for word in text.split()] # create list of all the words
duplicates = {}
for word in split_text:
count = text.count(word) # count occurrences of each word
if count > 1:
duplicates[word] = count
if duplicates:
print duplicates
return True
return False
示例输出:
{'dear': 2, 'the': 6, 'name': 2}
答案 5 :(得分:0)
with open('filepath','r') as f:
all_words = f.read().split()
return len(all_words) > len(set(all_words))