用于匹配具有类似第一行的文本块的正则表达式

时间:2014-07-23 13:53:14

标签: python regex

我有一个看起来像这样的字符串:

1.1 Title: title1 
line1
line2
line3
1.2 Title: Title2
line1
line2
line3

是否有正则表达式匹配每个块以1.x标题开头?我的所有试验都只给了我第一行或所有文件

感谢您的帮助

编辑:输出将是一个字符串列表,在这种情况下:

 s1 = '1.1 Title: title1 
     line1
     line2
     line3'

s2 = '1.2 Title: title2 
    line1
    line2
    line3'

并且行数未知,' block'

的数量

2 个答案:

答案 0 :(得分:1)

如果您的行始终一致,则可以使用以下内容:

matches = re.findall(r'(?s)(1\.\d+\s+Title:(?:(?!\n1\.\d).)+)', s)

或者您可以拆分这些行:

matches = re.split(r'(?m)\s+(?=^1\.\d)', s)

答案 1 :(得分:0)

"(^\d.\d[^\n]+\d(?:\D+\d)+?(?=\n\d.\d))|(^\d.\d[^\n]+\d(?:\D+\d)+$)"gms就是我想出来的。它分别捕获每个组,但它不是很漂亮。

来自Regex101.com的解释:

"(^\d.\d[^\n]+\d(?:\D+\d)+?(?=\n\d.\d))|(^\d.\d[^\n]+\d(?:\D+\d)+$)"gms
  1st Alternative: (^\d.\d[^\n]+\d(?:\D+\d)+?(?=\n\d.\d))
    1st Capturing group (^\d.\d[^\n]+\d(?:\D+\d)+?(?=\n\d.\d))
      ^ assert position at start of a line
      \d match a digit [0-9]
      . matches any character
      \d match a digit [0-9]
      [^\n]+ match a single character not present in the list below
        Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        \n matches a fine-feed (newline) character (ASCII 10)
      \d match a digit [0-9]
      (?:\D+\d)+? Non-capturing group
        Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
        \D+ match any character that is not a digit [^0-9]
          Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        \d match a digit [0-9]
      (?=\n\d.\d) Positive Lookahead - Assert that the regex below can be matched
        \n matches a fine-feed (newline) character (ASCII 10)
        \d match a digit [0-9]
        . matches any character
        \d match a digit [0-9]
  2nd Alternative: (^\d.\d[^\n]+\d(?:\D+\d)+$)
    2nd Capturing group (^\d.\d[^\n]+\d(?:\D+\d)+$)
      ^ assert position at start of a line
      \d match a digit [0-9]
      . matches any character
      \d match a digit [0-9]
      [^\n]+ match a single character not present in the list below
        Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        \n matches a fine-feed (newline) character (ASCII 10)
      \d match a digit [0-9]
      (?:\D+\d)+ Non-capturing group
        Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        \D+ match any character that is not a digit [^0-9]
          Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        \d match a digit [0-9]
      $ assert position at end of a line
  g modifier: global. All matches (do not return on first match)
  m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
  s modifier: single line. Dot matches newline characters