Question

我创建了一个从句子列表中删除停用词的功能。列表中的每个条目都是一个不同的句子。但是，输出是打印单词的每个字母并删除一些字母。

下面的代码是我尝试过的。我认为它是逐个字母打印的，因为它是不必要的一个附加循环，但是当我删除内部循环时，它只会输出没有任何明显变化的句子。


import pandas as pd
from nltk.corpus import stopwords


def remove_stop(data):
    filtered_line = []
    filtered_data = []

    stop_words = set(stopwords.words("english"))


    for line in data:
        for word in line:
            if word not in stop_words:
                filtered_line.append(word)
        filtered_data.append(filtered_line)
        filtered_line = []

    return filtered_data

data = pd.read_csv("text.csv") # each row is a sentence or sentences
title = list(data['Title'])

clean = remove_stop(title)
print(type(clean))
print(clean)

示例输入：[“马在谷仓里，“黄夹克咬了男孩”，“房子是红色的”]

预期输出：[“马房”，“黄夹克小男孩”，“房子红色”]

实际输出：[['T'，'h'，'e'，'，'h'，'r'，'e'，''，'w'，'，'，'，'h '，'e'，''，'b'，'r'，'n']，['T'，'h'，'e'，''，'e'，'l'，'l'， 'w'，''，'j'，'c'，'k'，'e'，'，'b'，'，'h'，'e'，''，'b']] [ 'T'，'h'，'e'，'，'h'，'u'，'e'，''，'w'，''，'r'，'e']]]

Answer 1

for word in line:

上面的行将使用默认的定界符并拆分为字符列表。更改为类似

for word in line.split(" "):

Answer 2

尝试将for word in line:更改为for word in line.split():，以遍历单词而不是字符

Python如何修复返回字母而不是单词的嵌套for循环？

2 个答案: