Crawler正常工作并且做得很好,但突然间它停止了正常工作。它遵循页面,但不会从here中提取项目。
以下是抓取工具:
from scrapy.item import Item, Field
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
class MobiItem(Item):
brand = Field()
title = Field()
price = Field()
class MobiSpider(CrawlSpider):
name = "mobi2"
allowed_domains = ["mobi.ge"]
start_urls = [
"http://mobi.ge/?page=products&category=60"
]
rules = (Rule (SgmlLinkExtractor(allow=("\?page=products&category=60&m_page=\d*", ))
, callback="parse_items", follow=True),)
def parse_items(self, response):
sel = Selector(response)
blocks = sel.xpath('//table[@class="m_product_previews"]/tr/td/a')
for block in blocks:
item = MobiItem()
try:
item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
yield item
except:
continue
检查xpath没有给出任何可疑结果。 任何帮助将不胜感激。
答案 0 :(得分:2)
分析日志而不是:
try:
item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
yield item
except:
continue
DO
try:
item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
yield item
except Exception as exc:
self.log('item filling exception: %s' % exc)
continue
我认为您可能会遇到IndexError
例外情况。