<div id="plugin-description">
<p itemprop="description" class="shortdesc">
BuddyPress helps you build any type of community website using WordPress, with member profiles, activity streams, user groups, messaging, and more. </p>
<div class="description-right">
<p class="button">
<a itemprop="downloadUrl" href="https://downloads.wordpress.org/plugin/buddypress.2.6.1.1.zip">Download Version 2.6.1.1</a>
我需要使用此代码进行描述
<p itemprop="description" class="shortdesc">[a-z]</p>
我需要下载链接
<a itemprop="downloadUrl" href="[A-Z]"></a>
答案 0 :(得分:0)
<?php
$data = <<<DATA
<div id="plugin-description">
<p itemprop="description" class="shortdesc">
BuddyPress helps you build any type of community website using WordPress.
</p>
<div class="description-right">
<p class="button">
<a itemprop="downloadUrl" href=".zip">Download Version 2.6.1.1</a>
</p>
</div>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$containers = $xpath->query("//div[@id='plugin-description']");
foreach ($containers as $container) {
$description = trim($xpath->query(".//p[@itemprop='description']", $container)->item(0)->nodeValue);
$link = $xpath->query(".//a[@itemprop='downloadUrl']/@href", $container)->item(0)->nodeValue;
echo $description . $link;
}
?>
答案 1 :(得分:0)
解析HTML比使用正则表达式有更好的工具。也就是说,有时候使用正则表达式解析HTML会安全且一致地工作,因此不要被欺负。这些情况通常用于小型的已知HTML标记集。
对于这种特殊情况,似乎使用HTML解析器会有效,让您拥有更清晰的代码。为了说明这一点,我将使用pup之类的命令行工具,它可以帮助您非常简单地检索您的内容。让我们假装标记存储在您计算机上的/tmp/input
。
抓住downloadUrl
...
pup < /tmp/input 'a[itemprop="downloadUrl"] attr{href}'
抓住description
...
pup < /tmp/input 'p[itemprop="description"] text{}'
我认为这说明了使用HTML解析器获取您所追求的内容的简单性和好处。