Question

我有一个数据框，其评论和极性如下。我刚刚在这里采取了2个样本，但类似于我有1000多个评论和极性

 Reviews              Polarity
This movie is good   Positive
This is bad          negative

我有一个名为find_features的函数，我需要在该函数中传递来自此数据框的所有评论，进行一些操作并将它们作为特征集中的列表获取。我试图使用下面的技术循环遍历df的审查列，同样对于那些列，我应该得到在featuresets中分配的极性值

featuresets = [(find_features(df.reviews), df.polarity) for (df.reviews, df.polarity) in df]

Find_features函数：

 def find_features(document):
 words = word_tokenize(document)
 features = {}
 for w in word_features:
     features[w] = (w in words)
 return features

通过调用此函数，我的评论列中的所有单词将作为find_feature中的标记化函数的结果进行拆分，并将被赋予极性（正或负）。我生成了一些单词列表，与最常用的单词进行了比较，word_feature是最常用的单词。

all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())

 good    -  positive
 bad     -  negative

在编写featuresets函数时，我收到以下错误：

ValueError: too many values to unpack (expected 2)

我知道上面的逻辑适用于任何类型的列表或字典，但我想为Dataframe使用类似的逻辑。你能帮帮我吗？

Answer 1

虽然给定代码片段的排序不是很直观，我注意到给定片段中有两件不寻常的事情：

for (df.reviews, df.polarity) in df：通常的方式是for col_name in df，它会遍历df中的可用列名。
当find_features应该返回dict时，您尝试将该结果放入表达式tuple

(find_features(df.reviews), df.polarity)

Python：获取错误：在循环访问Dataframe列时解压缩（预期2）的值太多

1 个答案: