Laravel Goutte无法获得元标记

时间:2016-05-03 02:28:41

标签: php laravel web-scraping laravel-5.2 goutte

我在laravel 5.2中使用Goutte\Client,似乎无法获取元标记内容,但可以获得标题,链接等。

返回空字符串。

$parse = $htmlParser->request('GET', 'http://www.sample.com');
$parse->filter('meta');

输出:

  private 'nodes' => 
    array (size=20)
      0 => 
        object(DOMElement)[289]
          public 'tagName' => string 'meta' (length=4)
          public 'schemaTypeInfo' => null
          public 'nodeName' => string 'meta' (length=4)
          public 'nodeValue' => string '' (length=0)
          public 'nodeType' => int 1
          public 'parentNode' => string '(object value omitted)' (length=22)
          public 'childNodes' => string '(object value omitted)' (length=22)
          public 'firstChild' => null
          public 'lastChild' => null
          public 'previousSibling' => null
          public 'nextSibling' => string '(object value omitted)' (length=22)
          public 'attributes' => string '(object value omitted)' (length=22)
          public 'ownerDocument' => string '(object value omitted)' (length=22)
          public 'namespaceURI' => null
          public 'prefix' => string '' (length=0)
          public 'localName' => string 'meta' (length=4)
          public 'baseURI' => null
          public 'textContent' => string '' (length=0)

返回标题。

$parse = $htmlParser->request('GET', 'http://www.sample.com');
$parse->filter('title');

输出:

  private 'nodes' => 
    array (size=1)
      0 => 
        object(DOMElement)[289]
          public 'tagName' => string 'title' (length=5)
          public 'schemaTypeInfo' => null
          public 'nodeName' => string 'title' (length=5)
          public 'nodeValue' => string 'Test title' (length=36)
          public 'nodeType' => int 1
          public 'parentNode' => string '(object value omitted)' (length=22)
          public 'childNodes' => string '(object value omitted)' (length=22)
          public 'firstChild' => string '(object value omitted)' (length=22)
          public 'lastChild' => string '(object value omitted)' (length=22)
          public 'previousSibling' => string '(object value omitted)' (length=22)
          public 'nextSibling' => string '(object value omitted)' (length=22)
          public 'attributes' => string '(object value omitted)' (length=22)
          public 'ownerDocument' => string '(object value omitted)' (length=22)
          public 'namespaceURI' => null
          public 'prefix' => string '' (length=0)
          public 'localName' => string 'title' (length=5)
          public 'baseURI' => null
          public 'textContent' => string 'Test title' (length=36)

1 个答案:

答案 0 :(得分:0)

this帖子上的@moisesgallego能够回答我的问题,但我也能找到另一个问题。所以基本上它遍历所有元标记并将名称和内容作为数组返回。

$crawler = $client->request('GET', 'https://stackoverflow.com/');
$meta = $crawler->filter('meta')->each(function($node) {
    return [
        'name' => $node->attr('name'),
        'content' => $node->attr('content'),
    ];
});