在熊猫数据框中有条件删除

时间:2018-08-31 23:52:07

标签: python pandas dataframe

我要删除任何行,包括数据框中的特定字符串。

我要删除电子邮件地址异常(带有.jpg)的数据行

这是我的代码,怎么了?

df = pd.DataFrame({'email':['abc@gmail.com', 'cde@gmail.com', 'ghe@ss.jpg', 'sldkslk@sss.com']})

df

             email
0    abc@gmail.com
1    cde@gmail.com
2       ghe@ss.jpg
3  sldkslk@sss.com

for i, r in df.iterrows():
    if df.loc[i,'email'][-3:] == 'com':
        df.drop(df.index[i], inplace=True) 

Traceback (most recent call last):

  File "<ipython-input-84-4f12d22e5e4c>", line 2, in <module>
    if df.loc[i,'email'][-3:] == 'com':

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1472, in __getitem__
    return self._getitem_tuple(key)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 870, in _getitem_tuple
    return self._getitem_lowerdim(tup)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 998, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1911, in _getitem_axis
    self._validate_key(key, axis)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1798, in _validate_key
    error()

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1785, in error
    axis=self.obj._get_axis_name(axis)))

KeyError: 'the label [2] is not in the [index]'

1 个答案:

答案 0 :(得分:1)

IIUC,您可以执行此操作,而不用iterrows遍历框架:

df = df[df.email.str.endswith('.com')]

返回:

>>> df
             email
0    abc@gmail.com
1    cde@gmail.com
3  sldkslk@sss.com

或者,对于较大的数据框,有时不使用str提供的pandas方法会更快,而只是使用python内置的字符串方法以纯列表理解的方式做到这一点:

df = df[[i.endswith('.com') for i in df.email]]