Question

说明

我正在爬网网站：bjx.com，所有代码都可以在本地运行。然后，我将代码放在Amazon服务上并运行，但失败了。

我做了什么

我想也许网站阻止了服务器，并且我尝试了一些方法：

1）curl http://guangfu.bjx.com.cn/xtgc/List.aspx?classid=583

2）wget http://guangfu.bjx.com.cn/xtgc/List.aspx?classid=583

错误消息如下：

Resolving news.bjx.com.cn (news.bjx.com.cn)... 114.113.145.103
Connecting to news.bjx.com.cn (news.bjx.com.cn)|114.113.145.103|:80... failed: Connection timed out.
Retrying.

--2019-04-23 05:45:00--  (try: 2)  http://news.bjx.com.cn/list
Connecting to news.bjx.com.cn (news.bjx.com.cn)|114.113.145.103|:80...

一些参考：

https://serverfault.com/questions/124952/testing-a-website-from-linux-command-line

我的问题：

如何确认该网站是否已阻止我，如果被阻止，我该如何解决该问题并抓取该网站，谢谢

Answer 1

如何通过特定的超时设置使程序失败？

例如，如果卷曲不能在10秒内得到响应，则使卷曲失败

卷曲-m 10

而且，要解决这些问题，您可以尝试使用VPN网络代理运行蜘蛛程序

无法访问亚马逊服务上的某些网站

1 个答案: