Question

我使用过simple_html_dom库，但我不能只为1个URL获取HTML内容，但我收到503错误。检查下面的代码。

$base = 'http://www.amazon.com/gp/offer-listing/B001F0M4K8/ref=dp_olp_all_mbc/183-8463780-9861412?ie=UTF8&condition=new';

echo $html = file_get_html($base);

错误： 警告：file_get_contents（http://www.amazon.com/gp/offer-listing/B001F0M4K8/ref=dp_olp_all_mbc/183-8463780-9861412?ie=UTF8&condition=new）[function.file-get-contents]：无法打开流：HTTP请求失败！第76行的D：\ xampp \ htdocs \ webcrawler-amazon \ webcrawler-amazon \ simple_html_dom.php中的HTTP / 1.1 503服务不可用

我被困在这里所以请帮助我。

Answer 1

我认为，服务器只是阻止您的请求，您将无法使用简单的HTTP请求从中获取数据。

您可以尝试使用curl，代理或两者（已准备好使用此解决方案，例如：AngryCurl或RollingCurl）

Answer 2

我建议您使用cURL执行此操作：http://php.net/manual/en/book.curl.php

您可以在PHP或命令行中使用它。网上有很多例子。

Answer 3

这是亚马逊的反机器人防御系统。

返回的页面以以下HTML注释开头：

<!--
        To discuss automated access to Amazon data please contact api-services-support@amazon.com.
        For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
-->

您需要使用浏览器很好地模仿真实客户的行为，或者询问他们是否有自动获取系统数据的批准方式。无论如何，使用API比废弃网页更好（也更容易）。

Answer 4

我也这样做，他们正在向您发送以下信息。有时，你可以通过它。

Enter the characters you see below
Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.
Type the characters you see in this image:
 

Try different image 
Continue shopping 

Conditions of Use Privacy Policy 
© 1996-2014, Amazon.com, Inc. or its affiliates

PHP file_get_html不起作用

4 个答案: