Question

我有一个提取器来提取有时会分布在2行的字符串。

正则表达式： (?s)<h1 itemprop="name">(.+[\w\n\t])</h1>

示例：

1）2行→

<h1 itemprop="name">Hello-, World1234
</h1>

结果：

Hello-, World1234
Blank Line   -- I want to remove/trim this line

2）在1行→

<h1 itemprop="name">Hello-, World1234</h1>

结果：

Hello-, World1234   -- This result is correct

Answer 1

You can use the following regex:

<h1 itemprop="name">\s*(([^<>\s\h]+\s*[^<>\h\s]+\h*)+)\s*</h1>

with a back reference to your first capturing group: \1

I have tested it on the following examples and it works file:

<h1 itemprop="name">
Hello-, 
World1234

</h1>

<h1 itemprop="name">Hello-, World1234

</h1>

<h1 itemprop="name">
Hello-, 


World1234

</h1>

it provides the following output:

1)

Hello-, 
World1234

2)

Hello-, World1234

3)

Hello-, 


World1234