我试图从网站上抓取一些内容。我最终发现它需要cookie,所以我用guzzle cookie插件解决了这个问题。这很奇怪,因为我无法通过执行var_dump来获取内容,但如果我这样做,它将显示该页面的回声'这让我觉得有一些动态数据调用,它可以获取数据。我已经习惯了gui但不确定我应该对待它吗?谢谢
如果我使用domcrawler,我会收到错误消息。
代码 -
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Symfony\Component\DomCrawler\Crawler;
use Guzzle\Http\Client;
use Guzzle\Plugin\Cookie\CookiePlugin;
use Guzzle\Plugin\Cookie\CookieJar\ArrayCookieJar;
$cookiePlugin = new CookiePlugin(new ArrayCookieJar());
$url = 'http://www.myurl.com';
// Add the cookie plugin to a client
$client = new Client();
$client->get();
$client->addSubscriber($cookiePlugin);
// Send the request with no cookies and parse the returned cookies
$client->get($url)->send();
// Send the request again, noticing that cookies are being sent
$request = $client->get($url);
$response = $request->send();
var_dump($response);
$crawler = new Crawler($response);
foreach ($crawler as $domElement) {
print $domElement->filter('a')->links();
}
错误
Expecting a DOMNodeList or DOMNode instance, an array, a
string, or null, but got "Guzzle\Http\Message\Response
答案 0 :(得分:4)
试试这个:
<强> For Guzzle 5 强>
$crawler = new Crawler($response->getBody()->getContents());
http://docs.guzzlephp.org/en/latest/http-messages.html#id2 http://docs.guzzlephp.org/en/latest/streams.html#creating-streams
<强> For Guzzle 3 强>
$crawler = new Crawler($response->getBody());
http://guzzle3.readthedocs.org/http-client/response.html#response-body
<强>更新强>
使用getContents方法的Guzzle 5的基本用法。
include 'vendor/autoload.php';
use GuzzleHttp\Client;
$client = new Client();
echo $client->get('http://stackoverflow.com')->getBody()->getContents();
其余的都在doc(包括Cookie)。