Question

这是我的意见：

<div class="entry-content">
    <p> Hey ! </p>
    <h2> How Are You ?! </h2>
</div><!-- .entry-content -->

这是我的RegEx！

"<div class=\"entry-content\">(.*?)</div><!-- .entry-content -->"

这项工作在<div>标记之间没有这样的行

时

<div class="entry-content"> Hey ! </div><!-- .entry-content -->

但实际上我需要所有内容甚至新行其他HTML标签等。

Answer 1

您应该使用XML Parsing framework之类的DOM来解析XML文档（包括HTML），但如果您真的需要使用正则表达式（假设是PCRE）那么s PCRE modifier：

s (PCRE_DOTALL)

如果设置了此修饰符，则会出现点元字符   pattern匹配所有字符，包括换行符。没有它，   新线被排除在外。此修饰符等同于Perl＆＃s; s / s   修改。诸如[^ a]的否定类总是匹配换行符   字符，与此修饰符的设置无关。

所以你可以写：

$matches = array();
preg_match_all("~<div class=\"entry-content\">(.*?)</div><!-- \\.entry-content -->~s",
    $text, $matches);

BTW：Here's an example为您提供如何使用DOM根据类名获取元素。

Answer 2

使用正确的tool作为作业，而不是尝试使用正则表达式解析它。

$html = <<<DATA
<div class="entry-content">
    Hey !
    How Are You ?!
</div><!-- .entry-content -->
DATA;

$dom = new DOMDocument;
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$node  = $xpath->query('//div[@class="entry-content"]');

echo $node->item(0)->nodeValue;

输出

    Hey !
    How Are You ?!

php正则表达式检测所有内容甚至换行

2 个答案: