Question

我有两个列表，一个列表包含人们的姓氏，另一个列表包含类似的数据。我已使用 any() 匹配两个列表并输出匹配项。

提供的示例数据，真实列表包含数千个条目。

matchers = ['Balle', 'Jobson', 'Watts', 'Dallow', 'Watkins']
full_name = ['Balle S & R', 'Donald D & S', 'Watkins LTD', 'Balle R & R', 'Dallow K & C']

matching = [s for s in full_name if any(xs in s for xs in matchers)]

print(matching)

我想返回每个匹配项的索引。对于上面的例子，理想的输出是：

[0, 0], [4, 2], [0, 3], [3, 4]

我试过了：

print([[i for i in range(len(full_name)) if item1 == full_name[i]] for item1 in matchers])

但这会返回一个空数组列表。实际上，我的列表包含数千个条目。当匹配的数据不完全相同时，是否可以找到匹配的索引？

Answer 1

您可以使用“matcher IN name”代替“==”。

说明： enumerate() 帮助我浏览列表并为列表中的每个值返回 (index,value)。因此，“index1”将“matcher”的索引存储在“matchers”列表中。同理，"index2" 是 full_name 中 "name" 的索引。

然后，我检查“matcher”是否是“name”的子字符串。如果这是真的，那么我会将匹配器索引和名称索引添加到最终列表中。

试运行： 假设当 index1=0 时，matcher="Balle"，那么我将遍历 full_name 中的所有值。假设 index2=0，name="Balle S & R"。然后，我的 if 检查为真，因为“Balle”是“Balle S & R”的子串。因此，我会将 [index1, index2] 也就是 [0,0] 附加到我的最终列表中。如果 matcher 不是子字符串，那么我会忽略该对并继续。

这是一个使用循环的工作代码。

matches = []
#Loop through each value in matchers and store (index, value)
for index1, matcher in enumerate(matchers):

#Loop through each value in full_name and store (index, value)
    for index2, name in enumerate(full_name):

        #Check if matcher is a substring of name
        if(matcher in name):
           
            #If true then add indices to the list 
            matches.append([index1, index2])

这是一个更短、更pythonic的版本：

matches = [[i1, i2] for i1 in range(len(matchers)) for i2 in range(len(full_name)) if matchers[i2] in full_name[i1]]

两者的输出： [[0, 0], [0, 3], [3, 4], [4, 2]]

返回列表中字符串和子字符串匹配的索引

1 个答案: