我有一个名为test.txt
的文本文件。我想阅读它并从文件中返回所有单词列表(删除换行符)。
这是我目前的代码:
def read_words(test.txt):
open_file = open(words_file, 'r')
words_list =[]
contents = open_file.readlines()
for i in range(len(contents)):
words_list.append(contents[i].strip('\n'))
return words_list
open_file.close()
运行此代码会生成以下列表:
['hello there how is everything ', 'thank you all', 'again', 'thanks a lot']
我希望列表看起来像这样:
['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot']
答案 0 :(得分:19)
根据文件的大小,这看起来很简单:
with open(file) as f:
words = f.read().split()
答案 1 :(得分:14)
使用以下内容替换for循环中的words_list.append(...)
行:
words_list.extend(contents[i].split())
这将在空格字符上拆分每一行,然后将结果列表的每个元素添加到words_list
。
或者作为将整个函数重写为列表解析的替代方法:
def read_words(words_file):
return [word for line in open(words_file, 'r') for word in line.split()]
答案 2 :(得分:5)
以下是我写的内容:
def read_words(words_file):
with open(words_file, 'r') as f:
ret = []
for line in f:
ret += line.split()
return ret
print read_words('test.txt')
使用itertools
可以稍微缩短功能,但我个人觉得结果不太可读:
import itertools
def read_words(words_file):
with open(words_file, 'r') as f:
return list(itertools.chain.from_iterable(line.split() for line in f))
print read_words('test.txt')
关于第二个版本的好处是它可以完全基于生成器,因此避免一次将所有文件的单词保存在内存中。
答案 3 :(得分:3)
有几种方法可以做到这一点。以下是一些:
如果您不关心重复的话:
def getWords(filepath):
with open('filepath') as f:
return list(itertools.chain(line.split() for line in f))
如果您想要返回每个单词只出现一次的单词列表:
注意:这不会保留单词的顺序
def getWords(filepath):
with open('filepath') as f:
return {word for word in line.split() for line in f} # python2.7
return set((word for word in line.split() for line in f)) # python 2.6
如果你想要一套 - 而且 - 想要保留单词的顺序:
def getWords(filepath):
with open('filepath') as f:
words = []
pos = {}
position = itertools.count()
for line in f:
for word in line.split():
if word not in pos:
pos[word] = position.next()
words.append(word)
return sorted(words, key=pos.__getitem__)
如果你想要一个词频词典:
def getWords(filepath):
with open('filepath') as f:
return collections.Counter(itertools.chain(line.split() for line in file))
希望这些帮助
答案 4 :(得分:0)
已经回答了实际问题,但是我想指出,由于函数在该行之前返回,因此不会执行f.close()行。尝试在return语句之前编写f.close()。