$crawler = Goutte::request('GET', 'https://examplesite.com/');
$crawler->filter('.blog')->each(function ($node) {
$uri = $node->html(); // $uri value is written below
dump($uri);
});
这是$ uri的值,
$uri = """<div class="blog" >
<a class="url" href="/blog/url">
<div class="blog-screenshot">
<img src="/blog/img/img.png" alt="">
</div>
<span class="details">More Info</span>
<div class="author">By <span class="author">John Doe</span></div>
<h3 class="blog-title">BLOG TITLE</h3>
</a>
<div class="blog-actions">
<a class="blog-preview" href="/blog/preview/url">Preview</a>
</div>
</div>"""
现在,我如何从$ uri中提取url href,img,src,title和action?
答案 0 :(得分:0)
试试这个
$crawler->filter('.blog')->each(function ($node) {
$blogUrl = $node->find('.url', 0)->attr('href');
$screenshotSrc = $node->find('.blog-screenshot > img', 0)->attr('src');
$title = $node->find('.blog-title', 0)->text();
$previewUrl = $node->find('.blog-preview', 0)->attr('href');
});