我正在尝试创建一个允许我在列表中检索标记化数据值的循环,检查标记化单元格值中是否有停用词并将其附加到新列表中。
# Importing the packages to be used
import xlrd
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
# Declaration of file path of the data and opening of workbook and worksheet
file_path = "C:/Users/L31101/Documents/Data/Copy_1.xlsx"
workbook = xlrd.open_workbook(file_path)
worksheet = workbook.sheet_by_name("ConsolidateModuleQnComment")
# Grabs the numbers of rows and columns of the worksheet
rowcount = worksheet.nrows
columncount = worksheet.ncols
# Prints the number of row and columns
print("\nRow count: %d" % rowcount)
print("Column count: %d" % columncount)
# Grabbing the cell values and placing them inside an array named data_value
data_value = []
for rowindex in range(2, rowcount):
# print("\nCurrent row number: %d" % rowindex)
# print(worksheet.cell_value(rowindex, 6))
data_value.append(worksheet.cell_value(rowindex, 6))
# Grabbing the values inside data_value cell and tokenizes them, and then adds them into the data_tokenized array
data_tokenized = []
for valueindex in range(0, len(data_value)):
data_tokenized.append(word_tokenize(data_value[valueindex]))
# Grabbing the tokenized values from the data_tokenized array and removing the stopwords
stop_words = set(stopwords.words("english"))
data_stopword_removed = []
for tokenizedindex in range(0, len(data_tokenized)):
if data_tokenized[tokenizedindex] not in stop_words:
data_stopword_removed.append(data_tokenized[tokenizedindex])
print("\nNumber of records: %d" % len(data_stopword_removed))
它提供以下错误消息
C:\Users\L31101\PycharmProjects\Year3\venv\Scripts\python.exe C:/Users/L31101/PycharmProjects/Year3/SentimentAnalysis.py
Row count: 5792
Column count: 7
Traceback (most recent call last):
File "C:/Users/L31101/PycharmProjects/Year3/SentimentAnalysis.py", line 47, in <module>
if test_variable not in stop_words:
TypeError: unhashable type: 'list'
Process finished with exit code 1
我有什么想法可以解决这个问题吗?
答案 0 :(得分:0)
尝试在错误发生前打印test_variable
。这将是一个清单。列表不能放入集合中,因为列表是可变的,并且没有必需的__hash__
方法。如果无法将列表放入集合中,则无法在集合中搜索列表。因此错误unhashable type
。
如果不知道你在这里测试的是什么,我不能说你的修正是什么。但不管它是什么,你都需要对list
以外的其他东西进行测试。