Question

我第一次尝试使用Scrapy。（是的，我看到了另一篇文章并没有得到答案）。所以我想知道让它至少运行起来非常简单。

这是我的蜘蛛代码：

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

class Spider(BaseSpider):
    name = "craigs"
    allowed_domain = ["craigslist.org"]
    start_urls = ["http://sfbay.craigslist.org/sfc/npo/"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select("//p")
        for titles in titles:
            title = titles.select("a/text()").extract()
            link = titles.select("a/@href").extract()
            print title, link

我收到此错误 “TCP连接超时：10060：连接尝试失败，因为连接方在一段时间后没有正确响应......”

我尝试使用其他网站网址但仍然没有。

如果是可能被阻止的移植端口应该打开哪些端口（但同时不让我的计算机容易受到攻击）谢谢。

Answer 1

您使用的是代理吗？如果是，请设置http_proxy环境变量或使用scrapy的代理中间件。

Scrapy TCP连接超时

1 个答案: