Question

在列表列上进行字符串匹配的最佳方法是什么？例如。我有一个数据集：

import numpy as np
import pandas as pd
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':xrange(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in xrange(3)]})
df

    L                           id
0   [tackle, apple, grapple]    0
1   [tackle, snapple, satchel]  1
2   [satchel, satchel, tackle]  2

我想返回L中任何项目与字符串匹配的行，例如＆＃39; GRAP＆＃39;应该返回第0行，并且＆＃39; sat＆＃39;应该返回1：2行。

Answer 1

让我们用这个：

np.random.seed(123)
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':range(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in range(3)]})
df
                             L  id
0    [tackle, snapple, tackle]   0
1   [grapple, satchel, tackle]   1
2  [satchel, grapple, grapple]   2

使用any和apply：

df[df.L.apply(lambda x: any('grap' in s for s in x))]

输出：

                             L  id
1   [grapple, satchel, tackle]   1
2  [satchel, grapple, grapple]   2

时序：

%timeit df.L.apply(lambda x: any('grap' in s for s in x))

10000次循环，最佳3次：每次循环194μs

%timeit df.L.apply(lambda i: ','.join(i)).str.contains('grap')

1000次循环，最佳3次：每循环481μs

%timeit df.L.str.join(', ').str.contains('grap')

1000个循环，最佳3：529μs/循环

Answer 2

df[df.L.apply(lambda i: ','.join(i)).str.contains('yourstring')]

Python：列表

2 个答案:

时序：