Question

我有两个列表：A和B。列表长度不同，并且都包含字符串。在两个列表中匹配子字符串的最佳方法是什么？

list_A = ['hello','there','you','are']
list_B = ['say_hellaa','therefore','foursquare']

我想要一个名为list_C的匹配子串的列表，其中包含：

list_C = ['hell','there','are']

我遇到了this的答案，但这需要我有一个匹配子串的列表。有没有一种方法可以在不手动创建匹配子字符串列表的情况下获得所需的内容？

This也无济于事，因为第二个列表包含子字符串。

Answer 1

自从pandas标记str.contains解决方案以来，

#S_A=pd.Series(list_A)
#S_B=pd.Series(list_B)

S_B[S_B.apply(lambda x : S_A.str.contains(x)).any(1)]
Out[441]: 
0    hell
2    here
dtype: object

Answer 2

这是一种方法。使用list comprehension。

list_A = ['hello','there','you','are']
list_B = ['hell','is','here']
jVal = "|".join(list_A)        # hello|there|you|are

print([i for i in list_B if i in jVal ])

输出：

['hell', 'here']

Answer 3

IIUC：我要用Numpy

import numpy as np
from numpy.core.defchararray import find

a = np.array(['hello', 'there', 'you', 'are', 'up', 'date'])
b = np.array(['hell', 'is', 'here', 'update'])

bina = b[np.where(find(a[:, None], b) > -1)[1]]
ainb = a[np.where(find(b, a[:, None]) > -1)[0]]

np.append(bina, ainb)

array(['hell', 'here', 'up', 'date'], dtype='<U6')

Answer 4

list_A = ['hello','there','you','are']
list_B = ['hell','is','here']
List_C = []

for a in list_A:
    for b in list_B:
        print(a,"<->",b)
        if a in b:
            List_C.append(a)
        if b in a:
            List_C.append(b)

print(List_C)

Answer 5

对于趣味性，这是使用正则表达式的答案！

import re

matches = []
for pat in list_B:
    matches.append(re.search(pat, ' '.join(list_A)))
matches = [mat.group() for mat in matches if mat]
print(matches)
# ['hell', 'here']

这将为找到的每个匹配项返回一个匹配对象，该对象的实际字符串由match.group()找到。请注意，如果找不到匹配项（例如list_B中的第二个元素），则会在None中得到一个matches，因此需要添加{{1} }，位于列表理解的末尾。

在两个列表中查找匹配的子字符串

5 个答案: