现在我必须做以下事情:
ix=None
for ixi in [res[col].str.contains('string') for col in res.columns]:
if ix is not None:
ix = ix | ixi
else:
ix = ixi
res[ix]
这是笔记本:
https://gist.github.com/denfromufa/12379b62ef6eec9252f4c9a77e46e2b1
生成输入DF的代码:
import pandas as pd
from string import ascii_letters as ascl
import numpy as np
res = pd.DataFrame(np.array([''.join(_) for _ in
zip(ascl[:9],ascl[9:18],ascl[18:27])]).reshape((3,3)),
columns='ca cb cc'.split(),
index='ra rb rc'.split())
输入DF:
ca cb cc
ra ajs bkt clu
rb dmv enw fox
rc gpy hqz irA
期望(已过滤)DF:
ca cb cc
rb dmv enw fox
rc gpy hqz irA
答案 0 :(得分:1)
您可以使用sum(axis=1)
:
In [59]: res[res.sum(axis=1).str.contains('e|A')]
Out[59]:
ca cb cc
rb dmv enw fox
rc gpy hqz irA
或apply()
与.str.contains()
和any()
:
In [51]: res[res.apply(lambda x: x.str.contains('e|A')).any(axis=1)]
Out[51]:
ca cb cc
rb dmv enw fox
rc gpy hqz irA
针对300K行的时间DF:
In [95]: df = pd.concat([res] * 10**5)
In [96]: df.shape
Out[96]: (300000, 3)
In [97]: %timeit res[res.sum(axis=1).str.contains('e|A')]
1000 loops, best of 3: 664 µs per loop
In [98]: %timeit res[res.apply(lambda x: x.str.contains('e|A')).any(axis=1)]
1000 loops, best of 3: 1.86 ms per loop
<强>解释强>
和
In [57]: res.sum(axis=1)
Out[57]:
ra ajsbktclu
rb dmvenwfox
rc gpyhqzirA
dtype: object
In [58]: res.sum(axis=1).str.contains('e|A')
Out[58]:
ra False
rb True
rc True
dtype: bool
应用
In [53]: res.apply(lambda x: x.str.contains('e|A'))
Out[53]:
ca cb cc
ra False False False
rb False True False
rc False False True
In [54]: res.apply(lambda x: x.str.contains('e|A')).any(axis=1)
Out[54]:
ra False
rb True
rc True
dtype: bool