如何使用dweidner / laravel-goutte刮取值或url,src和title?

时间:2017-07-23 16:34:03

标签: php laravel web-scraping goutte

this thread

$crawler = Goutte::request('GET', 'https://examplesite.com/');

$crawler->filter('.blog')->each(function ($node) {

$uri = $node->html(); // $uri value is written below
    dump($uri);
});

这是$ uri的值,

$uri = """<div class="blog" >
    <a class="url" href="/blog/url">
        <div class="blog-screenshot">
            <img src="/blog/img/img.png" alt="">
        </div>

        <span class="details">More Info</span>
        <div class="author">By <span class="author">John Doe</span></div>
        <h3 class="blog-title">BLOG TITLE</h3>
    </a>
    <div class="blog-actions">
        <a class="blog-preview" href="/blog/preview/url">Preview</a>
    </div>
</div>"""

现在,我如何从$ uri中提取url href,img,src,title和action?

1 个答案:

答案 0 :(得分:0)

试试这个

$crawler->filter('.blog')->each(function ($node) {

    $blogUrl        =  $node->find('.url', 0)->attr('href');

    $screenshotSrc  =  $node->find('.blog-screenshot > img', 0)->attr('src');

    $title          =  $node->find('.blog-title', 0)->text();

    $previewUrl     =  $node->find('.blog-preview', 0)->attr('href');

});