Question

我正在关注有关NLP的教程，但是在尝试将我的原始数据分为好和坏评论时遇到了关键错误错误。这是教程链接：https://towardsdatascience.com/detecting-bad-customer-reviews-with-nlp-d8b36134dc7e

#reviews.csv
I am so angry about the service
Nothing was wrong, all good
The bedroom was dirty
The food was great

#nlp.py
import pandas as pd

#read data
reviews_df = pd.read_csv("reviews.csv")
# append the positive and negative text reviews
reviews_df["review"] = reviews_df["Negative_Review"] + 
reviews_df["Positive_Review"]

reviews_df.columns

我看到以下错误：

File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Negative_Review'

为什么会这样？

Answer 1

收到此错误是因为您不了解如何构造数据。

当您执行df ['reviews'] = df ['Positive_reviews'] + df ['Negative_reviews']时，实际上是将“肯定评论”的值与“否定评论”（当前不存在）相加到“评论”中'列（chitch也不存在）。

您的csv仅仅是一个纯文本文件，每行只有一个文本。另外，由于要使用文本，因此请记住将每个字符串括在引号（“）中，否则逗号将创建假列。

使用这种方法，您似乎仍然会手动标记所有评论（通常，如果您正在使用机器学习，则需要在外部代码中进行处理并将其加载到机器学习文件中）。

为了使代码正常工作，您需要执行以下操作：

import pandas as pd

df = pd.read_csv('TestFileFolder/57886076.csv', names=['text'])
## Fill with placeholder values
df['Positive_review']=0
df['Negative_review']=1
df.head()

结果：

                              text  Positive_review  Negative_review
0  I am so angry about the service                0                1
1      Nothing was wrong, all good                0                1
2            The bedroom was dirty                0                1
3               The food was great                0                1

但是，我建议您使用一列（is_review_positive）并将其设置为true或false。您以后可以轻松对其进行编码。

使用熊猫时的Python KeyError

1 个答案: