df = pd.DataFrame({'a':{0:'aa',1:'dd',2:'cc'},
'b':{0:'aa(bb)daa',1:'eedd(ed)',2:'affaa(f)'}})
a b
0 aa aa(bb)daa
1 dd eedd(ed)
2 cc affaa(f)
我想提取括号内的字符,只要括号前的模式是df ['a']中的值即可。
我尝试使用:
def searcher(x):
pat_result = re.search(x[0] + '\((.*?)\)', x[1])
if pat_result:
return pat_result.group(1)
df[['a','b']].apply(lambda x :searcher(x), axis=1)
0 bb
1 ed
2 None
dtype: object
%%timeit
df[['a','b']].apply(lambda x :searcher(x), axis=1)
1.33 ms ± 3.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
我只是想知道是否有更快的方法(但仍然在熊猫中)或直接使用str.extract?
有没有办法使这项工作可行?
df['b'].str.extract(df['a'] + '\((.*?)\)', expand=False)
答案 0 :(得分:0)
Here's a solution that uses a loop. I ran the solution a few times and have gotten different times varying from faster to slower than the original solution.
%%timeit
for i, j in df.iterrows():
pat_search = re.search(j['a'] + '\((.*?)\)', j['b'])
if pat_search:
j['c'] = pat_search.group(1)
#First Iteration
264 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Second Iteration
1.62 ms ± 78.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For comparison to your original solution,
%%timeit
df[['a','b']].apply(lambda x :searcher(x), axis=1)
1.34 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)