如何使用PHP DOMDocument()检索子元素内的值?

时间:2019-06-17 18:20:33

标签: php html-parsing domdocument

我有一个$body变量,我正在从帖子中检索。用户可能会也可能不会张贴图片。

发布图片时,我必须检索有关图片的一些信息,有时用户可能会为图片写上标题。

这是html 无标题

<figure class="image"><img src="/storage/5/articles/pictures/asdf87.jpeg"></figure>

这是一个带有标题的示例

<figure class="image"><img src="/storage/5/articles/pictures/asdf87.jpeg"><figcaption>test_caption</figcaption></figure>

这是我到目前为止的代码:

$body = '<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse at dictum lectus. Ut volutpat pulvinar dui, quis elementum est bibendum sit amet. Curabitur a tempor augue. Nulla bibendum porttitor lacinia. Pellentesque tempor sem sed condimentum lobortis. Duis vulputate ante vel enim auctor luctus.</p><figure class="image"><img src="/storage/5/articles/pictures/1560793567749_d20caec3b48a1eef164cb4ca81ba2587.jpeg"><figcaption>tudo de ensaio</figcaption></figure><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse at dictum lectus. Ut volutpat pulvinar dui, quis elementum est bibendum sit amet. Curabitur a tempor augue. Nulla bibendum porttitor lacinia. Pellentesque tempor sem sed condimentum lobortis. Duis vulputate ante vel enim auctor luctus.</p><figure class="image"><img src="/storage/5/articles/pictures/1560793584944_4c614360da93c0a041b22e537de151eb.jpeg"><figcaption>tb ensaio gota</figcaption></figure><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse at dictum lectus. Ut volutpat pulvinar dui, quis elementum est bibendum sit amet. Curabitur a tempor augue. Nulla bibendum porttitor lacinia. Pellentesque tempor sem sed condimentum lobortis. Duis vulputate ante vel enim auctor luctus.</p><figure class="image"><img src="/storage/5/articles/pictures/1560793600192_21ae1a72068eff5f1c6e0238501b06a6.jpeg"><figcaption>tb ens colors</figcaption></figure><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse at dictum lectus. Ut volutpat pulvinar dui, quis elementum est bibendum sit amet. Curabitur a tempor augue. Nulla bibendum porttitor lacinia. Pellentesque tempor sem sed condimentum lobortis. Duis vulputate ante vel enim auctor luctus.</p>' ;

        $dom_err = libxml_use_internal_errors(true);
        $dom = new \DOMDocument();
        $dom->loadHtml($body, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
        $xpath = new \DOMXPath($dom);
        $imgs = [];
        foreach ($xpath->query("//figure/img") as $img) {
            $src = $img->getAttribute('src');
            if (preg_match('#/storage/(.*)/articles/pictures/(.*)#', $src, $result)) {
                $imgs[] = [
                    'id'      => $result[1],
                    'name'    => $result[2],
                    'caption' => $img->item(0)->textContent,
                ];
            }
        }
        libxml_clear_errors();
        libxml_use_internal_errors($dom_err);

我正在尝试在代码'caption' => $img->item(0)->textContent的这一部分中检索标题,但它不起作用。

我想念什么?

1 个答案:

答案 0 :(得分:1)

您可以做的是查看<img>标记中的下一个元素(使用nextSibling),如果这是<figcaption>元素,则将标题文本设置为文本内容,否则将其设置为空白...

if (preg_match('#/storage/(.*)/articles/pictures/(.*)#', $src, $result)) {
    $caption = $img->nextSibling;
    if ( $caption->localName == "figcaption" )  {
        $captionText = $caption->textContent;
    }
    else    {
        $captionText = "";
    }
    $imgs[] = [
        'id'      => $result[1],
        'name'    => $result[2],
        'caption' => $captionText,
    ];
}