Question

我有以下测试（格式如下）：

<td scope="row" align="left">
      My Class: TEST DATA<br>
      Test Section: <br>
      MY SECTION<br>
      MY SECTION 2<br>
    </td>

我试图在“测试部分：和我的部分之后

之间获得文本

我尝试了多次使用不同RegEx模式的尝试，但我没有到达任何地方。

如果我这样做：

(?<=Test)(.*?)(?=<br)

然后我得到正确答案：

' Section: '

但是，如果我这样做

(?<=Test)(.*?)(?=</td>)

我没有结果。结果应该是“我的第二节我的第二节”

我尝试过使用RegEx Multiline也没有结果。

任何帮助将不胜感激。

如果重要的话我用Python 2.7进行编码。

如果有什么不清楚，或者您需要更多信息，请告诉我。

Answer 1

使用re.S或re.DOTALL标记。或者使用(?s)添加正则表达式，使.匹配所有字符（包括换行符）。

没有标记，.与换行符不匹配。

(?s)(?<=Test)(.*?)(?=</td>)

示例：

>>> s = '''<td scope="row" align="left">
...       My Class: TEST DATA<br>
...       Test Section: <br>
...       MY SECTION<br>
...       MY SECTION 2<br>
...     </td>'''
>>>
>>> import re
>>> re.findall('(?<=Test)(.*?)(?=</td>)', s)  # without flags
[]
>>> re.findall('(?<=Test)(.*?)(?=</td>)', s, flags=re.S)
[' Section: <br>\n      MY SECTION<br>\n      MY SECTION 2<br>\n    ']
>>> re.findall('(?s)(?<=Test)(.*?)(?=</td>)', s)
[' Section: <br>\n      MY SECTION<br>\n      MY SECTION 2<br>\n    ']

Answer 2

从索引1获取匹配的组

Test Section:([\S\s]*)</td>

Live demo

注意：根据您的需要更改最后一部分。

示例代码：

import re
p = re.compile(ur'Test Section:([\S\s]*)</td>', re.MULTILINE)
test_str = u"..."

re.findall(p, test_str)

模式说明：

  Test Section:            'Test Section:'
  (                        group and capture to \1:
    [\S\s]*                  any character of: non-whitespace (all
                             but \n, \r, \t, \f, and " "), whitespace
                             (\n, \r, \t, \f, and " ") (0 or more
                             times (matching the most amount
                             possible))
  )                        end of \1
  </td>                    '</td>'

RegEx在两个具有换行符的字符串之间获取字符串

2 个答案: