Question

<div id="plugin-description">
    <p itemprop="description" class="shortdesc">
        BuddyPress helps you build any type of community website using WordPress, with member profiles, activity streams, user groups, messaging, and more. </p>
    <div class="description-right">
                <p class="button">
            <a itemprop="downloadUrl" href="https://downloads.wordpress.org/plugin/buddypress.2.6.1.1.zip">Download Version 2.6.1.1</a>

我需要使用此代码进行描述

<p itemprop="description" class="shortdesc">[a-z]</p>

我需要下载链接

<a itemprop="downloadUrl" href="[A-Z]"></a>

Answer 1

once again：

<?php

$data = <<<DATA
<div id="plugin-description">
    <p itemprop="description" class="shortdesc">
        BuddyPress helps you build any type of community website using WordPress.
    </p>
    <div class="description-right">
        <p class="button">
            <a itemprop="downloadUrl" href=".zip">Download Version 2.6.1.1</a>
        </p>
    </div>
</div>
DATA;

$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$containers = $xpath->query("//div[@id='plugin-description']");

foreach ($containers as $container) {
    $description = trim($xpath->query(".//p[@itemprop='description']", $container)->item(0)->nodeValue);
    $link = $xpath->query(".//a[@itemprop='downloadUrl']/@href", $container)->item(0)->nodeValue;
    echo $description . $link;
}

?>

请参阅a demo on ideone.com。

Answer 2

解析HTML比使用正则表达式有更好的工具。也就是说，有时候使用正则表达式解析HTML会安全且一致地工作，因此不要被欺负。这些情况通常用于小型的已知HTML标记集。

对于这种特殊情况，似乎使用HTML解析器会有效，让您拥有更清晰的代码。为了说明这一点，我将使用pup之类的命令行工具，它可以帮助您非常简单地检索您的内容。让我们假装标记存储在您计算机上的/tmp/input。

抓住downloadUrl ...

pup < /tmp/input 'a[itemprop="downloadUrl"] attr{href}'

抓住description ...

pup < /tmp/input 'p[itemprop="description"] text{}'

我认为这说明了使用HTML解析器获取您所追求的内容的简单性和好处。

如何为此字符串设置create regex

2 个答案: