Scrapy爬虫不会抓取任何网页

时间:2016-04-07 08:21:10

标签: python web-scraping scrapy web-crawler

我已经尝试了大约一天让这个爬虫工作并且不断出现错误,任何人都可以建议任何方法让它运行。 主要的蜘蛛代码是

import scrapy
from scrapy.spiders import Spider
from scrapy.selector import Selector


class gameSpider(scrapy.Spider):
name = "game_spider.py"
allowed_domains = ["*"]
start_urls = [
    "http://www.game.co.uk/en/grand-theft-auto-v-with-gta-online-3-500-000-1085837?categoryIdentifier=706209&catGroupId="
]

def parse(self, response):
    sel = Selector(response)
    sites = sel.xpath('//ul[@class="directory-url"]/li')
    items = []

    for site in sites:
        item = Website()
        item['name'] = site.xpath('//*[@id="details301149"]/div/div/h2/text()').extract()
        """item['link'] = site.xpath('//a/@href').extract()
        item['description'] = site.xpath('//*[@id="overview"]/div[3]()').re('-\s[^\n]*\\r')"""
        items.append(item)

    print items
    return items

商品代码是

import scrapy


class GameItem(Item):
    name = Field()
    pass

感谢先进的詹姆斯

1 个答案:

答案 0 :(得分:0)

您的start_urls链接返回erorr 500。 没有物品。

In [7]: sites = response.xpath('//ul[@class="directory-url"]/li')

In [8]: sites
Out[8]: []