通过在"中查找字符"来切换pandas数据帧串

时间:2016-08-25 13:32:48

标签: python python-2.7 pandas

我想根据该行中的字符串是否包含给定的子字符串,从数据框中提取一组行。

例如,说我有

testdf = pd.DataFrame({'A':['abc','efc','abz'], 'B':[4,5,6]})

我想在列'ab'中获取包含子串'A'的行。

我尝试了testdf.loc[lambda df: 'ab' in df['A'], :],但收到了以下错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-86c436129f94> in <module>()
----> 1 testdf.loc[lambda df: 'a' in df['A'], :]

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1292 
   1293         if type(key) is tuple:
-> 1294             return self._getitem_tuple(key)
   1295         else:
   1296             return self._getitem_axis(key, axis=0)

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    782     def _getitem_tuple(self, tup):
    783         try:
--> 784             return self._getitem_lowerdim(tup)
    785         except IndexingError:
    786             pass

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_lowerdim(self, tup)
    906         for i, key in enumerate(tup):
    907             if is_label_like(key) or isinstance(key, tuple):
--> 908                 section = self._getitem_axis(key, axis=i)
    909 
    910                 # we have yielded a scalar ?

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
   1465         # fall thru to straight lookup
   1466         self._has_valid_type(key, axis)
-> 1467         return self._get_label(key, axis=axis)
   1468 
   1469 

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis)
     91             raise IndexingError('no slices here, handle elsewhere')
     92 
---> 93         return self.obj._xs(label, axis=axis)
     94 
     95     def _get_loc(self, key, axis=0):

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in xs(self, key, axis, level, copy, drop_level)
   1747                                                       drop_level=drop_level)
   1748         else:
-> 1749             loc = self.index.get_loc(key)
   1750 
   1751             if isinstance(loc, np.ndarray):

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3977)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:7634)()

KeyError: False

让我感到困惑的是,testdf.loc[lambda df: df['A'] == 'abc', :]不会出现错误:它返回包含值abc的一行。所以看来'ab' in df['A']布尔值有些不正确...

我在Jupyter(4.0.6)笔记本中使用python 2.7和pandas 0.18.1。

1 个答案:

答案 0 :(得分:3)

使用str.contains

In [67]:
testdf[testdf['A'].str.contains('ab')]

Out[67]:
     A  B
0  abc  4
2  abz  6

你所尝试的内容首先没有意义:

In [70]:
'ab' in testdf['A']

Out[70]:
False

但你真正想做的就是测试&#39; ab&#39;在该列的每个元素中:

In [71]:
testdf['A'].apply(lambda x: 'ab' in x)

Out[71]:
0     True
1    False
2     True
Name: A, dtype: bool

但是,当存在矢量化方法

时,此处不需要apply

你在这里尝试了什么:

testdf.loc[lambda df: 'ab' in testdf['A']]

提出了一个keyerror,因为lambda返回了一个标量False,它不能用于索引整个df,但testdf.loc[lambda df: df['A'] == 'abc', :]有效,因为df['A'] == 'abc'返回一个布尔掩码,它可以用于掩盖整个df

lambda

中也不需要loc
testdf.loc[testdf['A'] == 'abc', :]

会工作,如果你想一想,你所做的就是为你的df提供一个lambda,这与上面没有什么不同