Question

要通过Python中的另一个字符串列表过滤字符串列表，我们可以使用以下代码：

result = [x for x in strings1 if x in strings2]

但是我们如何通过另一个字符串列表过滤子串列表？例如：

substrings = ['a', 'b', 'c']
strings = ['_b_', '_c_', '_d_']

结果应该是：

result = ['b', 'c']

Answer 1

你可以使用类似的东西：

[x for x in substrings if [y for y in strings if x in y]]

In [1]: substrings = ['a', 'b', 'c']

In [2]: strings = ['_b_', '_c_', '_d_']

In [3]: [x for x in substrings if [y for y in strings if x in y]]
Out[3]: ['b', 'c']

Answer 2

实现这一目标的优雅方法是将any与列表理解一起使用：

True

如果substrings中的任何字符串作为my_strings中的子字符串出现，则True将返回any。一旦找到匹配，它将使迭代短路（不检查其他匹配）并将结果返回为broadcast_arrays。由于In [121]: X,Y = np.broadcast_arrays(np.arange(4)[:,None], np.arange(1000)) In [122]: timeit X+Y 10.1 µs ± 31.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [123]: X,Y = np.broadcast_arrays(np.arange(1000)[:,None], np.arange(4)) In [124]: timeit X+Y 26.1 µs ± 30.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [125]: X.shape, X.strides Out[125]: ((1000, 4), (4, 0)) In [126]: Y.shape, Y.strides Out[126]: ((1000, 4), (0, 4))的短路属性，它不会在整个列表中进行不必要的迭代，从而提高性能。

是否可以通过Python中的另一个字符串列表过滤子字符串列表？

2 个答案: