我有一个提取器来提取有时会分布在2行的字符串。
正则表达式: (?s)<h1 itemprop="name">(.+[\w\n\t])</h1>
示例:
1)2行→
<h1 itemprop="name">Hello-, World1234
</h1>
结果:
Hello-, World1234
Blank Line -- I want to remove/trim this line
2)在1行→
<h1 itemprop="name">Hello-, World1234</h1>
结果:
Hello-, World1234 -- This result is correct
答案 0 :(得分:0)
You can use the following regex:
<h1 itemprop="name">\s*(([^<>\s\h]+\s*[^<>\h\s]+\h*)+)\s*</h1>
with a back reference to your first capturing group: \1
I have tested it on the following examples and it works file:
<h1 itemprop="name">
Hello-,
World1234
</h1>
<h1 itemprop="name">Hello-, World1234
</h1>
<h1 itemprop="name">
Hello-,
World1234
</h1>
it provides the following output:
1)
Hello-,
World1234
2)
Hello-, World1234
3)
Hello-,
World1234