Question

我正在尝试导入文本文件并将文本返回到每个单词的字符串列表中，同时返回小写但没有标点符号。

我创建了以下代码，但这并没有将每个单词拆分成字符串。还可以在理解中添加.lower()吗？

def read_words(words_file):
    """Turns file into a list of strings, lower case, and no punctuation"""
    return [word for line in open(words_file, 'r') for word in line.split(string.punctuation)]

Answer 1

是的，您可以在理解中添加.lower。它应该发生在word。此外，由于string.punctuation，以下代码可能不会拆分每个单词。如果你只是试图在没有参数的情况下分割调用.split()的空格就足够了。

Answer 2

这是一个列表理解，应该做你想做的一切：

[word.translate(None, string.punctuation).lower() for line in open(words_file) for word in line.split()]

您需要拆分空格（默认值）以分隔单词。然后，您可以转换每个结果字符串以删除标点符号并将其设为小写。

Answer 3

使用mapping to translate the words并在生成器函数中使用它。

import string
def words(filepath):
    '''Yield words from filepath with punctuation and whitespace removed.'''

    # map uppercase to lowercase and punctuation/whitespace to an empty string
    t = str.maketrans(string.ascii_uppercase,
                      string.ascii_lowercase,
                      string.punctuation + string.whitespace)

    with open(filepath) as f:
        for line in f:
            for word in line.strip().split():
                word = word.translate(t)
                # don't yield empty strings
                if word:
                    yield word

用法

for word in words('foo.txt'):
    print(word)

Answer 4

import string
def read_words(words_file):
    """Turns file into a list of strings, lower case, and no punctuation"""
    with open(words_file, 'r') as f:
        lowered_text = f.read().lower()
    return ["".join(char for char in word if char not in string.punctuation) for word in lowered_text.split()]

在列表理解中调用多个函数

4 个答案: