从Python(自然语言处理)的列表中删除停用词列表

时间:2019-01-17 10:28:00

标签: python-3.x nlp stanford-nlp opennlp

我一直在尝试使用python 3代码删除停用词,但是我的代码似乎无法正常工作,我想知道如何从下面的列表中删除停用词。示例结构如下:

    from nltk.corpus import stopwords

    word_split1=[['amazon','brand','- 
    ','solimo','premium','almonds',',','250g','by','solimo'],
    ['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'], 
    ['jbl','t450bt','extra','bass','wireless','on- 
    ear','headphones','with','mic','white','by','jbl','and']]

我正在尝试删除停用词,并尝试了以下是我的代码,如果有人可以帮助我纠正此问题,我将不胜感激。这是下面的代码

    stop_words = set(stopwords.words('english'))

    filtered_words=[]
    for i in word_split1:
        if i not in stop_words:
            filtered_words.append(i)

我收到错误消息:

    Traceback (most recent call last):
    File "<ipython-input-451-747407cf6734>", line 3, in <module>
    if i not in stop_words:
    TypeError: unhashable type: 'list'

2 个答案:

答案 0 :(得分:1)

您有一个列表列表。

尝试:

word_split1=[['amazon','brand','- ','solimo','premium','almonds',',','250g','by','solimo'],['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],['jbl','t450bt','extra','bass','wireless','on-ear','headphones','with','mic','white','by','jbl','and']]
stop_words = set(stopwords.words('english'))
filtered_words=[]
for i in word_split1:
    for j in i:
        if j not in stop_words:
            filtered_words.append(j)

或整理您的列表。

例如:

from itertools import chain    

word_split1=[['amazon','brand','- ','solimo','premium','almonds',',','250g','by','solimo'],['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],['jbl','t450bt','extra','bass','wireless','on-ear','headphones','with','mic','white','by','jbl','and']]
stop_words = set(stopwords.words('english'))
filtered_words=[]
for i in chain.from_iterable(word_split1):
    if i not in stop_words:
        filtered_words.append(i)

filtered_words = [i for i in chain.from_iterable(word_split1) if i not in stop_words]

答案 1 :(得分:1)

列表是2D数组,您正尝试对列表进行哈希处理,首先将其转换为1D数组,然后您的代码会正常工作,

word_split1 = [j for x in word_split1 for j in x] 

stop_words = set(stopwords.words('english'))

filtered_words=[]
for i in word_split1:
    if i not in stop_words:
        filtered_words.append(i)