Question

假设有一个关键字（密码）“ rain”。仅当用户提供的单词的行（！）中的字符的75％等于关键字时，才必须执行该程序：

这是我的正则表达式代码：

key = 'rain'
l_word = int(len(key) *3 /4)
my_regex = r'^[a-z0-9_]*' + '[' + key + ']'+'{' + str(l_word) +  ',}'  + '[a-z0-9_]*$' 
bool(re.match(my_regex,'air'))

其中l_word是关键字长度的75％。但是在my_regex中，存在一个有问题的地方：'[' + key + ']'，因为它与关键字的任何组合（在我的情况下是“ rain”）匹配，但不是连续的。例如，“空气”不起作用，但“ 12Q ain ”应该起作用。

我该如何解决？

Answer 1

确定要使用正则表达式吗？像这样的东西可以连续计算比率：

>>> a = list('abce')
>>> b = list('abcd')
( 100 - (sum(i != j for i, j in zip(key, 'air')) / float(len(a))) * 100 )
75.0

但是如果b = list('bdce')仅为50％

Answer 2

您可以使用这种基于交替的方法：

>>> key = 'rain'
>>> l_word = int(len(key) *3 /4)

>>> my_regex = re.compile(r'^' + key[0:l_word] + '|' + key[-l_word:] + '$')

>>> print (my_regex.pattern)
^rai|ain$

>>> print bool(my_regex.search('air'))
False
>>> print bool(my_regex.search('12Qain'))
True
>>> print bool(my_regex.search('raisin'))
True

正则表达式^rai|ain$要么在开头或结尾匹配给定关键字的75％字符。

Answer 3

这种方法使用n-gram来允许变化的比率和变化的密钥长度，同时确保字母必须连续。

import re
import math

key = 'paint'
n = math.ceil(len(key) * 0.75) # use ceiling for when len(key) * 3 is not a factor of 4

def ngrams(key, n):
    output = []
    for i in range(len(key) - n + 1):
        output.append(key[i:(i+n)])
    return output

patterns = '|'.join(ngrams(key, n))
regex = r'^[a-z0-9_]*' + patterns + '[a-z0-9_]*$'

print("Allowed matches: {}".format(patterns))
print("Pants matches: {}".format(bool(re.search(regex, 'pants'))))
print("Pains matches: {}".format(bool(re.search(regex, 'pains'))))
print("Taint matches: {}".format(bool(re.search(regex, 'taint'))))

Allowed matches: pain|aint
Pants matches: False
Pains matches: True
Taint matches: True

请记住，Python已经有了使用带有两个字符串的in关键字检查子字符串的方法。因此，您可以跳过正则表达式并执行以下操作：

patterns = ngrams(key, n)
for test in ['pants', 'pains', 'taint']:
    matches = 0
    for pattern in patterns:
        if pattern in test:
            matches += 1
    if matches:
        print(test, 'matches')
    else:
        print(test, 'did not match')

pants did not match
pains matches
taint matches

python中的正则表达式：如果连续n个字符等于模式，则单词匹配

3 个答案: