我有一个网址,其中可能包含也可能没有拼写错误的品牌名称。假设拼写错误是该品牌的众多排列之一。我想检查它的存在。我写了下面的代码,虽然复杂性非常高......
import re
from itertools import permutations
url = "http://www.amazno.com/"
brands = [...]
# ^ this is a Set of 25,000 brand names in lowercase, retrieved from Alexa.
# it has "google" and "amazon" in it, for example.
for brand in brands:
# get all permutations of this brand
perms_list = ["".join(p) for p in permutations(brand)]
# remove duplicates by typecasting into a Set
perms = set(perms_list)
for perm in perms:
# search the URL for the permutation
m = re.search(perm, url)
if m:
return 1
return 0
有更快的方法吗?