我只检索文件中的唯一单词,这是我到目前为止所用的,但是有一个更好的方法在python中实现这个大O表示法吗?现在这是n平方
def retHapax():
file = open("myfile.txt")
myMap = {}
uniqueMap = {}
for i in file:
myList = i.split(' ')
for j in myList:
j = j.rstrip()
if j in myMap:
del uniqueMap[j]
else:
myMap[j] = 1
uniqueMap[j] = 1
file.close()
print uniqueMap
答案 0 :(得分:3)
如果您想查找所有唯一字词并将foo
视为与foo.
相同,则需要删除标点符号。
from collections import Counter
from string import punctuation
with open("myfile.txt") as f:
word_counts = Counter(word.strip(punctuation) for line in f for word in line.split())
print([word for word, count in word_counts.iteritems() if count == 1])
如果您想忽略大小写,还需要使用line.lower()
。如果你想准确地得到唯一的单词,那么除了在空格上分割线之外还有更多的内容。
答案 1 :(得分:3)
我采用collections.Counter
方法,但如果只想要使用set
s,那么您可以通过以下方式执行此操作:
with open('myfile.txt') as input_file:
all_words = set()
dupes = set()
for word in (word for line in input_file for word in line.split()):
if word in all_words:
dupes.add(word)
all_words.add(word)
unique = all_words - dupes
给出输入:
one two three
two three four
four five six
输出结果为:
{'five', 'one', 'six'}
答案 2 :(得分:2)
尝试此操作以获取文件中的唯一字词。使用Counter
from collections import Counter
with open("myfile.txt") as input_file:
word_counts = Counter(word for line in input_file for word in line.split())
>>> [word for (word, count) in word_counts.iteritems() if count==1]
-> list of unique words (words that appear exactly once)
答案 3 :(得分:1)
你可以稍微修改你的逻辑并在第二次出现时将其从唯一移动(例如使用集而不是dicts):
words = set()
unique_words = set()
for w in (word.strip() for line in f for word in line.split(' ')):
if w in words:
continue
if w in unique_words:
unique_words.remove(w)
words.add(w)
else:
unique_words.add(w)
print(unique_words)