I newbie in using scrappy. I want to scrape link in this website harga-hp。就像我分享图片一样
当我单击小米时,它将链接到小米页面,然后我将抓取价格和名称。有人可以帮助我修复此代码。
Application
和items.py:
import scrapy
from handset.items import HandsetItem
class HandsetpriceSpider(scrapy.Spider):
name = 'handsetprice'
start_urls = ['http://id.priceprice.com/harga-hp/']
def parse(self, response):
urls = response.css('ul.maker > a::attr(href)').extract()
for url in urls:
url = response.urljoin(url)
yield scrapy.Request(url=url, callback=self.parse_details)
next_page_url = response.css('li.last > a::attr(href)').extract_first()
if next_page_url:
next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(url=next_page_url, callback=self.parse)
def parse_details(self, response):
yield {
'Name' : response.css('li.name a::text').extract_first(),
'Price' : response.css('.newPice::text').extract_first(),
}
答案 0 :(得分:1)
您的“ URL”的css选择器需要使用路径“ ul> li> a”,就像在您的问题主题中一样。
您在parse_details()中还拼写了“ newPrice”,该错误会在您修复网址选择器后弹出。