为什么我的scrapy总是告诉我“TCP连接超时”

时间:2013-08-23 11:24:01

标签: scrapy

DEBUG: Retrying 
 (failed 2 times): TCP connection timed out: 110: Connection timed out.

PS: 系统是ubuntu, 我能成功地做到这一点:

wget http://www.dmoz.org/Computers/Programming/Languages/Python/Book/

蜘蛛代码:

#!/usr/bin/python

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

class DmozSpider(BaseSpider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = ["http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//ul/li')
        for site in sites:
            title = site.select('a/text()').extract()
            link = site.select('a/@href').extract()
            desc = site.select('text()').extract()
            print title, link, desc

2 个答案:

答案 0 :(得分:3)

您的网络中存在问题或端口被阻止。

同时检查您的设置是否配置错误。

答案 1 :(得分:0)

您的语法错误为额外"使用:

start_urls=["http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"‌​]