第1部分

Question

我正在使用Java并且想要构建两个适合两种不同场景的reg表达式：

1：

STARTText blah, blah
\    next line with more text, but the leading backslash
\    next line with more text, but the leading backslash
\    next line with more text, but the leading backslash

直到第一行不再以反斜杠开头。

2：

Now you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text

并且此块在以下之后以另外的空行结束。 8978.但另外我知道，带有起始数字的块将重复10次然后结束。

因此，以某种方式过滤单个行是可能的，但如何在它们之间使用多个换行符？即使是第一个块，当我不知道何时/如何结束它。还搜索反斜杠。所以，我的方法是使用一个闭合的表达式，只有一个 - 我也可以用于replaceAll（）

Answer 1

第一个正则表达式：

Pattern regex = Pattern.compile(
    "^          # Start of line\n" +
    "STARTText  # Match this text\n" +
    ".*\\r?\\n  # Match whatever follows on the line plus (CR)LF\n" +
    "(?:        # Match...\n" +
    " ^\\\\     # Start of line, then a backslash\n" +
    " .*\\r?\\n # Match whatever follows on the line plus (CR)LF\n" +
    ")*         # Repeat as needed", 
    Pattern.MULTILINE | Pattern.COMMENTS);

第二个正则表达式：

Pattern regex = Pattern.compile(
    "(?:        # Match...\n" +
    " ^         # Start of line\n" +
    " \\d{4}\\b # Match exactly four digits\n" +
    " .*\\r?\\n # Match whatever follows on the line plus (CR)LF\n" +
    ")+         # Repeat as needed (at least once)", 
    Pattern.MULTILINE | Pattern.COMMENTS);

Answer 2

正则表达式1：

/^STARTText.*?(\r?\n)(?:^\\.*?\1)+/m

现场演示： http://www.rubular.com/r/G35kIn3hQ4

正则表达式2：

/^.*?(\r?\n)(?:^\d{4}\s.*?\1)+/m

现场演示： http://www.rubular.com/r/TxFbBP1jLJ

编辑：

Java Demo 1：http://ideone.com/BPNrm6

Java中的Regex 1：

(?m)^STARTText.*?(\\r?\\n)(?:^\\\\.*?\\1)+

Java Demo 2：http://ideone.com/TQB8Gs

Java中的Regex 2：

(?m)^.*?(\\r?\\n)(?:^\\d{4}\\s.*?\\1)+

Answer 3

在这两种情况下，我都使用像(?=^[^\\])这样的零断言预测来确保下一行继续拥有我正在寻找的东西。

(?=启动零断言预测，这需要存在的值但不消耗值
^[^\\]匹配一行的开头，后跟任何字符，然后是\
)关闭断言

第1部分

这将匹配第1部分的所有文本，其中捕获的第一行后跟任意数量的\行。

^([^\\].*?)(?=^[^\\])

Regular expression image

Edit live on Debuggex

    Java Code Example:
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    class Module1{
      public static void main(String[] asd){
      String sourcestring = "STARTFirstText blah, blah
\    1next line with more text, but the leading backslash
\    2next line with more text, but the leading backslash
\    3next line with more text, but the leading backslash
STARTsecondText blah, blah
\    4next line with more text, but the leading backslash
\    5next line with more text, but the leading backslash
\    6next line with more text, but the leading backslash
foo";
      Pattern re = Pattern.compile("^([^\\\\].*?)(?=^[^\\\\])",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
      Matcher m = re.matcher(sourcestring);
      int mIdx = 0;
        while (m.find()){
          for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
            System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
          }
          mIdx++;
        }
      }
    }

    $matches Array:
    (
        [0] => Array
            (
                [0] => STARTFirstText blah, blah
    \    1next line with more text, but the leading backslash
    \    2next line with more text, but the leading backslash
    \    3next line with more text, but the leading backslash

                [1] => STARTsecondText blah, blah
    \    4next line with more text, but the leading backslash
    \    5next line with more text, but the leading backslash
    \    6next line with more text, but the leading backslash

            )

        [1] => Array
            (
                [0] => STARTFirstText blah, blah
    \    1next line with more text, but the leading backslash
    \    2next line with more text, but the leading backslash
    \    3next line with more text, but the leading backslash

                [1] => STARTsecondText blah, blah
    \    4next line with more text, but the leading backslash
    \    5next line with more text, but the leading backslash
    \    6next line with more text, but the leading backslash

            )

    )

第2部分

这将匹配第一行，后跟几行以数字

开头的行

^([^\d].*?)(?=^[^\d])

Regular expression image

Edit live on Debuggex

实施例

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text

Second you will see the following links for the items:
2222 leading 4 digits and then some text
3333 leading 4 digits and then some text
4444 leading 4 digits and then some text";
  Pattern re = Pattern.compile("^([^\\d].*?)(?=^[^\\d])",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

$matches Array:
(
    [0] => Array
        (
            [0] => First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text

            [1] => 

        )

    [1] => Array
        (
            [0] => First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text

            [1] => 

        )

)

Answer 4

对反斜杠使用'\'，对一个换行使用'\ r \ n | \ _ \'，对4位数使用'\ d {4}'：

.*(\r|r\n)

（你的第一个blahblah）

\\.*(\r|r\n)

（你的反斜杠行）

((\d{4}.*(\r|r\n))+(\r|\r\n))+

（你的4个数字块以emtpy行结尾，整个用+重复）

具有多行和特殊结构的字符串的正则表达式

4 个答案:

编辑：

Java Demo 1：http://ideone.com/BPNrm6

Java Demo 2：http://ideone.com/TQB8Gs

第1部分

第2部分