我正在尝试导入文本文件并将文本返回到每个单词的字符串列表中,同时返回小写但没有标点符号。
我创建了以下代码,但这并没有将每个单词拆分成字符串。还可以在理解中添加.lower()
吗?
def read_words(words_file):
"""Turns file into a list of strings, lower case, and no punctuation"""
return [word for line in open(words_file, 'r') for word in line.split(string.punctuation)]
答案 0 :(得分:0)
是的,您可以在理解中添加.lower
。它应该发生在word
。此外,由于string.punctuation
,以下代码可能不会拆分每个单词。如果你只是试图在没有参数的情况下分割调用.split()
的空格就足够了。
答案 1 :(得分:0)
这是一个列表理解,应该做你想做的一切:
[word.translate(None, string.punctuation).lower() for line in open(words_file) for word in line.split()]
您需要拆分空格(默认值)以分隔单词。然后,您可以转换每个结果字符串以删除标点符号并将其设为小写。
答案 2 :(得分:0)
使用mapping to translate the words并在生成器函数中使用它。
import string
def words(filepath):
'''Yield words from filepath with punctuation and whitespace removed.'''
# map uppercase to lowercase and punctuation/whitespace to an empty string
t = str.maketrans(string.ascii_uppercase,
string.ascii_lowercase,
string.punctuation + string.whitespace)
with open(filepath) as f:
for line in f:
for word in line.strip().split():
word = word.translate(t)
# don't yield empty strings
if word:
yield word
用法
for word in words('foo.txt'):
print(word)
答案 3 :(得分:0)
import string
def read_words(words_file):
"""Turns file into a list of strings, lower case, and no punctuation"""
with open(words_file, 'r') as f:
lowered_text = f.read().lower()
return ["".join(char for char in word if char not in string.punctuation) for word in lowered_text.split()]