我有一个包含两列的Dataframe df:“标签”和“审阅”。作为数据清理过程,我删除了所有空值。现在,我想从评论栏中删除所有停用词和标点符号。
尝试此代码时遇到键盘错误。
stemmer = PorterStemmer()
for i in range(len(df)):
review = re.sub('[^a-zA-Z]', ' ',df['review'][i] )
review = review.lower()
review = review.split()
review = [ stemmer.stem(word) for word in review if word not in stopwords.words('english')]
df['review'][i] = " ".join(review)
KeyError Traceback (most recent call last)
<ipython-input-44-91ef309cd900> in <module>
2
3 for i in range(len(df)):
----> 4 review = re.sub('[^a-zA-Z]', ' ',df['review'][i] )
5 review = review.lower()
6 review = review.split()
~\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
866 key = com.apply_if_callable(key, self)
867 try:
--> 868 result = self.index.get_value(self, key)
869
870 if not is_scalar(result):
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4373 try:
4374 return self._engine.get_value(s, k,
-> 4375 tz=getattr(series.dtype, 'tz', None))
4376 except KeyError as e1:
4377 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 140
请帮帮我。
答案 0 :(得分:1)
下面是没有循环的解决方案。 在熊猫中,将循环用作最后一个资源:
df['review'] = df['review'].replace('[^a-zA-Z]',' ',regex=True)
df['review'] = df['review'].str.lower()
df['review'] = df['review'].str.split()