应用错误收集

PHP simple_html_dom load_file和用户代理

时间：2014-07-31 04:08:26

标签： php user-agent simple-html-dom

我需要使用simple_dom_html-＆gt; load_file（）抓取一个网站，我需要包含一个用户代理，请关注我的代码，但我不知道我的代码是否正确或是否有好处如何实现我的需求。提前谢谢

$option = array( 'http' => array( 'method' => 'GET', 'header' => 'User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', ) ); $context = stream_context_create($option); $simple_html_dom = new simple_html_dom(); $simple_html_dom -> load_file(CRAWLER_URL, false, $context);

1 个答案:

答案 0 :(得分：0)

我已经测试了您的方法/代码，我可以确认它按预期工作：HTTP标头发送中的用户代理正确地更改为您提供的代码。： - ）

至于你的不确定性：我通常使用curl函数来获取HTML字符串（http://php.net/manual/en/ref.curl.php）。通过这种方式，我可以更好地控制HTTP请求，然后（当任何工作正常时）我在curl上使用的HTML字符串上使用simple_dom_html→str_get_html()函数。所以我在错误处理方面更灵活，处理重定向并且我实现了一些缓存......

您的问题的解决方案只是grep一个像http://www.whatsmyuseragent.com/这样的URL并锁定请求中使用的用户代理字符串的结果，以检查它是否按预期工作...