遍历熊猫数据框时出现“ KeyError:”

时间:2020-09-16 06:56:13

标签: python pandas numpy dataframe nlp

我有一个包含两列的Dataframe df:“标签”和“审阅”。作为数据清理过程,我删除了所有空值。现在,我想从评论栏中删除所有停用词和标点符号。

dataframe

尝试此代码时遇到键盘错误。

    stemmer = PorterStemmer()
    for i in range(len(df)):
        review = re.sub('[^a-zA-Z]', ' ',df['review'][i] )
        review = review.lower()
        review = review.split()
        review = [ stemmer.stem(word) for word in review if word not in stopwords.words('english')]
        df['review'][i] = " ".join(review)
    

code

     KeyError                                  Traceback (most recent call last)
    <ipython-input-44-91ef309cd900> in <module>
          2 
          3 for i in range(len(df)):
     ----> 4     review = re.sub('[^a-zA-Z]', ' ',df['review'][i] )
          5     review = review.lower()
          6     review = review.split()

    ~\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
        866         key = com.apply_if_callable(key, self)
        867         try:
    --> 868             result = self.index.get_value(self, key)
        869 
        870             if not is_scalar(result):

    ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
       4373         try:
       4374             return self._engine.get_value(s, k,
     -> 4375                                           tz=getattr(series.dtype, 'tz', None))
       4376         except KeyError as e1:
       4377             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

    pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

    pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

    pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

    pandas/_libs/hashtable_class_helper.pxi in 
    pandas._libs.hashtable.Int64HashTable.get_item()

    pandas/_libs/hashtable_class_helper.pxi in 
    pandas._libs.hashtable.Int64HashTable.get_item()

    KeyError: 140

请帮帮我。

1 个答案:

答案 0 :(得分:1)

下面是没有循环的解决方案。 在熊猫中,将循环用作最后一个资源:

df['review'] = df['review'].replace('[^a-zA-Z]',' ',regex=True)
df['review'] = df['review'].str.lower()
df['review'] = df['review'].str.split()