Ruby:确定一行是否在正则表达式匹配的结果中

时间:2018-08-17 00:55:27

标签: ruby regex

我有一个相当复杂的正则表达式,用于匹配位于ASCII分隔符之间的文档部分(例如==================)。我需要确定文档中的给定行是否是此正则表达式匹配的行之一。到目前为止,我的方法是存储通过将我的文档与正则表达式进行匹配而返回的MatchData,将其转换为数组并对其进行迭代,以查找与给定行的匹配项。

    between_separators = lambda { |ln, context| 
        body = context[:body]
        rx = /^(([=\*_]){23,}\2{3}(?:\2|[\r\n])+)([\s\S]+?)\1/
        matchdata = body.match(rx)
        matched_lines = matchdata.to_a.map { |m| m.split("\n") }.flatten
        matched_lines.each { |ml| return 1 if ml.match(ln) }
        return 0
    }

虽然不常见,但在某些情况下我会得到假阳性,因为我要检查的行与第一次匹配结果中实际存在的其他行相同。

有解决这个问题的更聪明的方法吗?

编辑:

让我提供更多背景信息。

我得到了一个纯文本文档,其中包含用“分隔符”行括起来的文本块。我想检查从文档中取出的一行是否在此类分隔符之间。

以下是我正在处理的示例:

    This is some text that should not be matched. As you can see, it is not enclosed
by separator lines.

===========================================================
This part should be matched as it is between two separator lines. Note that the
opening and closing separators are composed of the exact same number of the same
character.
===========================================================
This block should not be matched as it is not enclosed by its own separators,
but rather the closing separator of the previous block and the opening 
separator of the next block.
===========================================================
It is tricky to distinguish between an enclosed and non-enclosed blocks, because
sometimes a matching pair of separators appears to be legal, while it is really
the closing separator of the previous block and the opening separator of the
next one (e.g. the block obove this one).
===========================================================
==================================
=====
This block is enclosed by multiline separators.
==================================
=====
Some more text that should not be matched by the regex.
***************************************



A separator can use one of the following characters: '=' or '*' or '_'.


***************************************
***************************************
*******************
Another example of a multiline separated block.
***************************************
*******************

>Even more text not to be matchedby the regex. This time, preceeded by a
>variable number of '>'.
>>__________________________________________
>>And another type of separator. The block is now also a part of a reply section
>>of the email.
>>__________________________________________

我的目标是能够呼叫between_separators["This block is enclosed by multiline separators.", context]并得到1作为结果。尽管我提供的方法在大多数情况下会成功,但它并不可靠,我想对其进行改进以免产生假阳性结果。

0 个答案:

没有答案