Question

在尝试学习和使用Goutte来抓取网站的描述时，它会检索文本但删除所有标记（即<br><b>）。有没有办法检索div中所有文本的值，包括html标签？或者是否有一种更容易的替代方式可以为我提供这种能力？

    <?php 
            require_once "vendor/autoload.php";
            use Goutte\Client;

            // Init. new client
            $client = new Client();
            $crawler = $client->request('GET', "examplesite.com/example");

            // Crawl response
            $description = $crawler->filter('element.class')->extract('_text');
    ?>

Answer 1

您可以使用html()功能

http://api.symfony.com/4.0/Symfony/Component/DomCrawler/Crawler.html#method_html

喜欢这个

$descriptions = $crawler->filter('element.class')->each(function($node) {
    return $node->html();
})

您可以使用strip_tags PHP函数来清理它

http://php.net/manual/fr/function.strip-tags.php

Goutte提取带标签

1 个答案: