Question

我想根据该行中的字符串是否包含给定的子字符串，从数据框中提取一组行。

例如，说我有

testdf = pd.DataFrame({'A':['abc','efc','abz'], 'B':[4,5,6]})。

我想在列'ab'中获取包含子串'A'的行。

我尝试了testdf.loc[lambda df: 'ab' in df['A'], :]，但收到了以下错误：

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-86c436129f94> in <module>()
----> 1 testdf.loc[lambda df: 'a' in df['A'], :]

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1292 
   1293         if type(key) is tuple:
-> 1294             return self._getitem_tuple(key)
   1295         else:
   1296             return self._getitem_axis(key, axis=0)

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    782     def _getitem_tuple(self, tup):
    783         try:
--> 784             return self._getitem_lowerdim(tup)
    785         except IndexingError:
    786             pass

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_lowerdim(self, tup)
    906         for i, key in enumerate(tup):
    907             if is_label_like(key) or isinstance(key, tuple):
--> 908                 section = self._getitem_axis(key, axis=i)
    909 
    910                 # we have yielded a scalar ?

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
   1465         # fall thru to straight lookup
   1466         self._has_valid_type(key, axis)
-> 1467         return self._get_label(key, axis=axis)
   1468 
   1469 

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis)
     91             raise IndexingError('no slices here, handle elsewhere')
     92 
---> 93         return self.obj._xs(label, axis=axis)
     94 
     95     def _get_loc(self, key, axis=0):

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in xs(self, key, axis, level, copy, drop_level)
   1747                                                       drop_level=drop_level)
   1748         else:
-> 1749             loc = self.index.get_loc(key)
   1750 
   1751             if isinstance(loc, np.ndarray):

/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3977)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:7634)()

KeyError: False

让我感到困惑的是，testdf.loc[lambda df: df['A'] == 'abc', :]不会出现错误：它返回包含值abc的一行。所以看来'ab' in df['A']布尔值有些不正确...

我在Jupyter（4.0.6）笔记本中使用python 2.7和pandas 0.18.1。

Answer 1

使用str.contains：

In [67]:
testdf[testdf['A'].str.contains('ab')]

Out[67]:
     A  B
0  abc  4
2  abz  6

你所尝试的内容首先没有意义：

In [70]:
'ab' in testdf['A']

Out[70]:
False

但你真正想做的就是测试＆＃39; ab＆＃39;在该列的每个元素中：

In [71]:
testdf['A'].apply(lambda x: 'ab' in x)

Out[71]:
0     True
1    False
2     True
Name: A, dtype: bool

但是，当存在矢量化方法

时，此处不需要apply

你在这里尝试了什么：

testdf.loc[lambda df: 'ab' in testdf['A']]

提出了一个keyerror，因为lambda返回了一个标量False，它不能用于索引整个df，但testdf.loc[lambda df: df['A'] == 'abc', :]有效，因为df['A'] == 'abc'返回一个布尔掩码，它可以用于掩盖整个df

lambda：

中也不需要loc

testdf.loc[testdf['A'] == 'abc', :]

会工作，如果你想一想，你所做的就是为你的df提供一个lambda，这与上面没有什么不同

通过在＆＃34;中查找字符＆＃34;来切换pandas数据帧串

1 个答案: