我想根据该行中的字符串是否包含给定的子字符串,从数据框中提取一组行。
例如,说我有
testdf = pd.DataFrame({'A':['abc','efc','abz'], 'B':[4,5,6]})
。
我想在列'ab'
中获取包含子串'A'
的行。
我尝试了testdf.loc[lambda df: 'ab' in df['A'], :]
,但收到了以下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-12-86c436129f94> in <module>()
----> 1 testdf.loc[lambda df: 'a' in df['A'], :]
/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
1292
1293 if type(key) is tuple:
-> 1294 return self._getitem_tuple(key)
1295 else:
1296 return self._getitem_axis(key, axis=0)
/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
782 def _getitem_tuple(self, tup):
783 try:
--> 784 return self._getitem_lowerdim(tup)
785 except IndexingError:
786 pass
/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_lowerdim(self, tup)
906 for i, key in enumerate(tup):
907 if is_label_like(key) or isinstance(key, tuple):
--> 908 section = self._getitem_axis(key, axis=i)
909
910 # we have yielded a scalar ?
/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
1465 # fall thru to straight lookup
1466 self._has_valid_type(key, axis)
-> 1467 return self._get_label(key, axis=axis)
1468
1469
/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis)
91 raise IndexingError('no slices here, handle elsewhere')
92
---> 93 return self.obj._xs(label, axis=axis)
94
95 def _get_loc(self, key, axis=0):
/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in xs(self, key, axis, level, copy, drop_level)
1747 drop_level=drop_level)
1748 else:
-> 1749 loc = self.index.get_loc(key)
1750
1751 if isinstance(loc, np.ndarray):
/Users/justinpounders/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
1945 return self._engine.get_loc(key)
1946 except KeyError:
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
1948
1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3977)()
pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:7634)()
KeyError: False
让我感到困惑的是,testdf.loc[lambda df: df['A'] == 'abc', :]
不会出现错误:它返回包含值abc
的一行。所以看来'ab' in df['A']
布尔值有些不正确...
我在Jupyter(4.0.6)笔记本中使用python 2.7和pandas 0.18.1。
答案 0 :(得分:3)
使用str.contains
:
In [67]:
testdf[testdf['A'].str.contains('ab')]
Out[67]:
A B
0 abc 4
2 abz 6
你所尝试的内容首先没有意义:
In [70]:
'ab' in testdf['A']
Out[70]:
False
但你真正想做的就是测试&#39; ab&#39;在该列的每个元素中:
In [71]:
testdf['A'].apply(lambda x: 'ab' in x)
Out[71]:
0 True
1 False
2 True
Name: A, dtype: bool
但是,当存在矢量化方法
时,此处不需要apply
你在这里尝试了什么:
testdf.loc[lambda df: 'ab' in testdf['A']]
提出了一个keyerror,因为lambda返回了一个标量False
,它不能用于索引整个df,但testdf.loc[lambda df: df['A'] == 'abc', :]
有效,因为df['A'] == 'abc'
返回一个布尔掩码,它可以用于掩盖整个df
lambda
:
loc
testdf.loc[testdf['A'] == 'abc', :]
会工作,如果你想一想,你所做的就是为你的df提供一个lambda,这与上面没有什么不同