python,正则表达式,匹配重复字符的字符串

时间:2012-10-04 01:32:31

标签: python regex

我正在尝试在Apache日志文件中搜索与特定漏洞扫描相关的特定条目。我需要将来自单独文件的字符串与weblog中的URI内容进行匹配。我试图找到的一些字符串包含重复的特殊字符,如'?'。

例如,我需要能够匹配仅包含字符串''????????'的攻击但我不想被警告字符串'??????????????????因为每次攻击都与特定的攻击ID号相关联。因此,使用:

if attack_string in log_file_line:
    alert_me()

......不行。因此,我决定将字符串放入正则表达式中:

if re.findall(r'\%s' % re.escape(attack_string),log_file_line):
    alert_me()

...因为包含字符串'????????'的任何日志文件行无法正常工作即使超过8'也匹配?'在日志文件行中。

然后我尝试为正则表达式添加边界:

if re.findall(r'\\B\%s\\B' % re.escape(attack_string),log_file_line):
    alert_me()

...在两种情况下都停止了匹配。我需要能够动态分配我正在寻找的字符串,但我不想匹配任何包含字符串的行。我怎么能做到这一点?

1 个答案:

答案 0 :(得分:1)

怎么样:

(?:[^?]|^)\?{8}(?:[^?]|$)

<强>解释

(?-imsx:(?:[^?]|^)\?{8}(?:[^?]|$))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    [^?]                     any character except: '?'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ^                        the beginning of the string
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  \?{8}                    '?' (8 times)
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    [^?]                     any character except: '?'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------