scrapy crawler获得空结果

时间:2014-04-30 22:46:38

标签: python xpath scrapy

Crawler正常工作并且做得很好,但突然间它停止了正常工作。它遵循页面,但不会从here中提取项目。

以下是抓取工具:

from scrapy.item import Item, Field
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector

class MobiItem(Item):
    brand = Field()
    title = Field()
    price = Field()

class MobiSpider(CrawlSpider):
    name = "mobi2"
    allowed_domains = ["mobi.ge"]
    start_urls = [
        "http://mobi.ge/?page=products&category=60"
    ]

    rules = (Rule (SgmlLinkExtractor(allow=("\?page=products&category=60&m_page=\d*", ))
            , callback="parse_items", follow=True),)

    def parse_items(self, response):
        sel = Selector(response)
        blocks = sel.xpath('//table[@class="m_product_previews"]/tr/td/a')
        for block in blocks:
            item = MobiItem()
            try:
                item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
                item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
                item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
                yield item
            except:
                continue

检查xpath没有给出任何可疑结果。 任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:2)

分析日志而不是:

        try:
            item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
            item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
            item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
            yield item
        except:
            continue

DO

        try:
            item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
            item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
            item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
            yield item
        except Exception as exc:
            self.log('item filling exception: %s' % exc)
            continue

我认为您可能会遇到IndexError例外情况。