我一直在尝试使用python 3代码删除停用词,但是我的代码似乎无法正常工作,我想知道如何从下面的列表中删除停用词。示例结构如下:
from nltk.corpus import stopwords
word_split1=[['amazon','brand','-
','solimo','premium','almonds',',','250g','by','solimo'],
['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],
['jbl','t450bt','extra','bass','wireless','on-
ear','headphones','with','mic','white','by','jbl','and']]
我正在尝试删除停用词,并尝试了以下是我的代码,如果有人可以帮助我纠正此问题,我将不胜感激。这是下面的代码
stop_words = set(stopwords.words('english'))
filtered_words=[]
for i in word_split1:
if i not in stop_words:
filtered_words.append(i)
我收到错误消息:
Traceback (most recent call last):
File "<ipython-input-451-747407cf6734>", line 3, in <module>
if i not in stop_words:
TypeError: unhashable type: 'list'
答案 0 :(得分:1)
您有一个列表列表。
尝试:
word_split1=[['amazon','brand','- ','solimo','premium','almonds',',','250g','by','solimo'],['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],['jbl','t450bt','extra','bass','wireless','on-ear','headphones','with','mic','white','by','jbl','and']]
stop_words = set(stopwords.words('english'))
filtered_words=[]
for i in word_split1:
for j in i:
if j not in stop_words:
filtered_words.append(j)
或整理您的列表。
例如:
from itertools import chain
word_split1=[['amazon','brand','- ','solimo','premium','almonds',',','250g','by','solimo'],['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],['jbl','t450bt','extra','bass','wireless','on-ear','headphones','with','mic','white','by','jbl','and']]
stop_words = set(stopwords.words('english'))
filtered_words=[]
for i in chain.from_iterable(word_split1):
if i not in stop_words:
filtered_words.append(i)
或
filtered_words = [i for i in chain.from_iterable(word_split1) if i not in stop_words]
答案 1 :(得分:1)
列表是2D数组,您正尝试对列表进行哈希处理,首先将其转换为1D数组,然后您的代码会正常工作,
word_split1 = [j for x in word_split1 for j in x]
stop_words = set(stopwords.words('english'))
filtered_words=[]
for i in word_split1:
if i not in stop_words:
filtered_words.append(i)