Question

我是vimal，我需要帮助匹配以下情况的模式

我在HTML文件中有这样的文字：

F&#x00FC;r Clemens, <br/>Gotthard und Hermine</p>
F&#x00FC;r Clemens, <br/>Gotthard und Hermine </s>
F&#x00FC;r Clemens, <br/>Gotthard und Hermine
</p>

my $ string =“Gotthard und Hermine”; 我想匹配“Gotthard und Hermine”这个，我用($string)[\s]*</[a-zA-Z]+>

做了这个

但是如果匹配文本之间有任何标记，我就无法匹配例如：Für Clemens, Gotthard und Hermine </s>

我需要你的帮助朋友请帮我解决这个问题

提前致谢

Answer 1

如果您只是想测试一下html页面中是否有一些纯文本，那么您可以使用HTML::Strip或一些等效模块去残酷路线并删除所有标签。

use strict;
use warnings;

use HTML::Strip;

my $hs = HTML::Strip->new();

my $clean_text = $hs->parse( q{F&#x00FC;r Clemens, <br/>Gotthard <b>und</b> Hermine </s>} );

if ($clean_text =~ /Gotthard\s+und\s+Hermine/) {
    print "found\n";
}

输出：

found

Answer 2

你可能需要这个：

(Gotthard.*und.*Hermine)

这将匹配其间的所有html标签。例如Gotthard und Hermine

演示：http://regex101.com/r/wF0bH3

现在假设你在html标签中有Hermine或Gotthard，在这种情况下你可能需要这个正则表达式，它也会考虑结束标签而不包括它们

[>](.*Gotthard.*und.*Hermine.*)[<]

例如。 Gotthard und Hermine

演示：http://regex101.com/r/vM7pA5

以下方案的Perl正则表达式

2 个答案: