假设我有以下文本文件:
But soft what light through yonder window breaks It is the east and Juliet is the sun Arise fair sun and kill the envious moon Who is already sick and pale with grief
我想将此文件中的所有唯一字词添加到列表中
fname = open("romeo.txt")
lst = list()
for line in fname:
line = line.rstrip()
words = line.split(' ')
for word in words:
if word in lst: continue
lst = lst + words
lst.sort()
print lst
但该计划的选择如下:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and',
'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief',
'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick',
'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what',
'window', 'with', 'yonder']
'和'以及其他一些单词在列表中多次出现。我应该改变循环的哪一部分,以便我没有任何重复的单词?谢谢!
答案 0 :(得分:6)
以下是您的代码存在的问题,更正后的版本如下:
fname = open("romeo.txt") # better to open files in a `with` statement
lst = list() # lst = [] is more Pythonic
for line in fname:
line = line.rstrip() # not required, `split()` will do this anyway
words = line.split(' ') # don't specify a delimiter, `line.split()` will split on all white space
for word in words:
if word in lst: continue
lst = lst + words # this is the reason that you end up with duplicates... words is the list of all words for this line!
lst.sort() # don't sort in the for loop, just once afterwards.
print lst
所以它几乎可以使用,但是,您应该只将当前word
附加到列表中,而不是从words
行中获得的所有split()
。如果您只是更改了行:
lst = lst + words
到
lst.append(word)
它会起作用。
以下是更正后的版本:
with open("romeo.txt") as infile:
lst = []
for line in infile:
words = line.split()
for word in words:
if word not in lst:
lst.append(word) # append only this word to the list, not all words on this line
lst.sort()
print(lst)
正如其他人所建议的那样,set
是解决这个问题的好方法。这很简单:
with open('romeo.txt') as infile:
print(sorted(set(infile.read().split())))
使用sorted()
您无需保留对列表的引用。如果您确实想在其他地方使用排序列表,请执行以下操作:
with open('romeo.txt') as infile:
unique_words = sorted(set(infile.read().split()))
print(unique_words)
将整个文件读入内存可能不适用于大文件。您可以使用生成器来有效地读取文件,而不会使主代码混乱。此生成器将一次读取一行文件,它将一次生成一个单词。它不会一次读取整个文件,除非文件包含一个长行(您的样本数据显然没有):
def get_words(f):
for line in f:
for word in line.split():
yield word
with open('romeo.txt') as infile:
unique_words = sorted(set(get_words(infile)))
答案 1 :(得分:4)
使用集合在python中更容易:
if self.imageView.image?.imageOrientation == .Left || self.imageView.image?.imageOrientation == .Right {
self.isLandscape=true
}
如果您想要一个列表,请在之后进行转换:
with open("romeo.txt") as f:
unique_words = set(f.read().split())
可能很高兴让它们按字母顺序排列:
unique_words = list(unique_words)
答案 2 :(得分:2)
有几种方法可以达到你想要的效果 1)使用列表:
fname = open("romeo.txt")
lst = list()
for word in fname.read().split(): # This will split by all whitespace, meaning that it will spilt by ' ' and '\n'
if word not in lst:
lst.append(word)
lst.sort()
print lst
2)使用集合:
fname = open("romeo.txt")
lst = list(set(fname.read().split()))
lst.sort()
print lst
设置只是忽略重复项,因此检查是不必要的
答案 3 :(得分:1)
如果您想获得一组唯一字词,最好使用set
,而不是list
,因为in lst
效率可能非常低。
对于单词计算,最好使用Counter
object。
答案 4 :(得分:1)
我愿意:
with open('romeo.txt') as fname:
text = fname.read()
lst = list(set(text.split()))
print lst
>> ['and', 'envious', 'already', 'fair', 'is', 'through', 'pale', 'yonder', 'what', 'sun', 'Who', 'But', 'moon', 'window', 'sick', 'east', 'breaks', 'grief', 'with', 'light', 'It', 'Arise', 'kill', 'the', 'soft', 'Juliet']
答案 5 :(得分:0)
使用word
代替words
(也简化了循环)
fname = open("romeo.txt")
lst = list()
for line in fname:
line = line.rstrip()
words = line.split(' ')
for word in words:
if word not in lst:
lst.append(word)
lst.sort()
print lst
或者将[word]
与+
运算符
fname = open("romeo.txt")
lst = list()
for line in fname:
line = line.rstrip()
words = line.split(' ')
for word in words:
if word in lst: continue
lst = lst + [word]
lst.sort()
print lst
答案 6 :(得分:0)
import string
with open("romeo.txt") as file:
lst = []
uniquewords = open('romeo_unique.txt', 'w') # opens the file
for line in file:
words = line.split()
for word in words: # loops through all words
word = word.translate(str.maketrans('', '', string.punctuation)).lower()
if word not in lst:
lst.append(word) # append only this unique word to the list
uniquewords.write(str(word) + '\n') # write the unique word to the file
答案 7 :(得分:-1)
您需要更改
class MyForm(forms.ModelForm):
def label_from_instance(self, obj):
return "My Object #%i" % obj.id
def __init__(self, *args, **kwargs):
super(MyForm, self).__init__(*args, **kwargs)
self.fields['my_multi_choice_field'].label_from_instance = self.label_from_instance
至lst = lst + words
如果您需要唯一字词,则需要在列表中添加lst.append(word)
而不是word
(这是所有字词)。