如何抓取<ul> <li> <a>

时间:2018-08-16 07:48:42

标签: python scrapy

I newbie in using scrappy. I want to scrape link in this website harga-hp。就像我分享图片一样

image

当我单击小米时,它将链接到小米页面,然后我将抓取价格和名称。有人可以帮助我修复此代码。

Application

和items.py:

import scrapy
from handset.items import HandsetItem
class HandsetpriceSpider(scrapy.Spider):
    name = 'handsetprice'
    start_urls = ['http://id.priceprice.com/harga-hp/']

    def parse(self, response):
        urls = response.css('ul.maker > a::attr(href)').extract()
        for url in urls:
            url = response.urljoin(url)
            yield scrapy.Request(url=url, callback=self.parse_details)

        next_page_url = response.css('li.last > a::attr(href)').extract_first()
        if next_page_url:
            next_page_url = response.urljoin(next_page_url)
            yield scrapy.Request(url=next_page_url, callback=self.parse)

    def parse_details(self, response):
        yield {
            'Name' : response.css('li.name a::text').extract_first(),
            'Price' : response.css('.newPice::text').extract_first(),         
        }

1 个答案:

答案 0 :(得分:1)

您的“ URL”的css选择器需要使用路径“ ul> li> a”,就像在您的问题主题中一样。

您在parse_details()中还拼写了“ newPrice”,该错误会在您修复网址选择器后弹出。