我在laravel 5.2中使用Goutte\Client
,似乎无法获取元标记内容,但可以获得标题,链接等。
返回空字符串。
$parse = $htmlParser->request('GET', 'http://www.sample.com');
$parse->filter('meta');
输出:
private 'nodes' =>
array (size=20)
0 =>
object(DOMElement)[289]
public 'tagName' => string 'meta' (length=4)
public 'schemaTypeInfo' => null
public 'nodeName' => string 'meta' (length=4)
public 'nodeValue' => string '' (length=0)
public 'nodeType' => int 1
public 'parentNode' => string '(object value omitted)' (length=22)
public 'childNodes' => string '(object value omitted)' (length=22)
public 'firstChild' => null
public 'lastChild' => null
public 'previousSibling' => null
public 'nextSibling' => string '(object value omitted)' (length=22)
public 'attributes' => string '(object value omitted)' (length=22)
public 'ownerDocument' => string '(object value omitted)' (length=22)
public 'namespaceURI' => null
public 'prefix' => string '' (length=0)
public 'localName' => string 'meta' (length=4)
public 'baseURI' => null
public 'textContent' => string '' (length=0)
返回标题。
$parse = $htmlParser->request('GET', 'http://www.sample.com');
$parse->filter('title');
输出:
private 'nodes' =>
array (size=1)
0 =>
object(DOMElement)[289]
public 'tagName' => string 'title' (length=5)
public 'schemaTypeInfo' => null
public 'nodeName' => string 'title' (length=5)
public 'nodeValue' => string 'Test title' (length=36)
public 'nodeType' => int 1
public 'parentNode' => string '(object value omitted)' (length=22)
public 'childNodes' => string '(object value omitted)' (length=22)
public 'firstChild' => string '(object value omitted)' (length=22)
public 'lastChild' => string '(object value omitted)' (length=22)
public 'previousSibling' => string '(object value omitted)' (length=22)
public 'nextSibling' => string '(object value omitted)' (length=22)
public 'attributes' => string '(object value omitted)' (length=22)
public 'ownerDocument' => string '(object value omitted)' (length=22)
public 'namespaceURI' => null
public 'prefix' => string '' (length=0)
public 'localName' => string 'title' (length=5)
public 'baseURI' => null
public 'textContent' => string 'Test title' (length=36)
答案 0 :(得分:0)
$crawler = $client->request('GET', 'https://stackoverflow.com/');
$meta = $crawler->filter('meta')->each(function($node) {
return [
'name' => $node->attr('name'),
'content' => $node->attr('content'),
];
});