由变量类型的分隔符分隔的匹配块的正则表达式

时间:2018-08-01 13:21:39

标签: ruby regex

我需要从自动附加的文本块中清除一些电子邮件。这些块中的每个块都由一对分隔符(单行或多行)包围。我需要一个正则表达式来匹配此类分隔符之间的所有内容,以便将其删除。

以下是说明问题的文字,并显示了所有需要说明的奇怪情况:

This is some text that should not be matched. As you can see, it is not enclosed
by separator lines.

===========================================================
This part should be matched as it is between two separator lines. Note that the
opening and closing separators are composed of the exact same number of the same
character.
===========================================================
This block should not be matched as it is not enclosed by its own separators,
but rather the closing separator of the previous block and the opening 
separator of the next block.
===========================================================
It is tricky to distinguish between an enclosed and non-enclosed blocks, because
sometimes a matching pair of separators appears to be legal, while it is really
the closing separator of the previous block and the opening separator of the
next one (e.g. the block obove this one).
===========================================================
==================================
=====
This block is enclosed by multiline separators.
==================================
=====
Some more text that should not be matched by the regex.
***************************************



A separator can be a different character, for example the asterisk.


***************************************
***************************************
*******************
Another example of a multiline separated block.
***************************************
*******************

>Even more text not to be matchedby the regex. This time, preceeded by a
>variable number of '>'.
>>__________________________________________
>>And another type of separator. The block is now also a part of a reply section
>>of the email.
>>__________________________________________

请注意,这里没有要处理的递归-一个块永远不在另一个块内。 我已经尝试了一段时间了,但是在使用正则表达式方面我经验不足。我不知道如何使表达式“记住”开头的分隔符是什么。

现在,我的解决方案将为如下所示的块生成不正确的匹配项:

=========================
text text
text
*************************

我真的很感谢您的帮助。我正在使用Ruby,但是如果需要,可以使用不同类型的语法。

2 个答案:

答案 0 :(得分:0)

尝试使用正则表达式:((.)(?:\2)+)(?:\n(\2+))?\n.+?\n\1(?:(?:\n\3))?

Demo

请注意,我在多行分隔符周围添加了2个限制:

  1. 分隔符中只有2行

  2. 第二行的
  3. 分隔符与第一行的相同

让我知道是否不需要这些限制。

答案 1 :(得分:0)

看起来像backward capture应该这样做:

public class FragmentFlood extends Fragment{
View view;


public FragmentFlood() {
}

@Nullable
@Override
public View onCreateView(LayoutInflater inflater, @Nullable ViewGroup container, Bundle savedInstanceState) {
    view=inflater.inflate(R.layout.flood_fragment,container,false);
    return view;
}
}