scrapy:如何刮<ul> <li>

时间:2017-09-28 15:39:14

标签: html web-scraping scrapy

我正在学习如何使用scrapy api。

我想将文字抓取到<h2 class ><a href >的链接,但它不起作用(附件)

html page

我尝试在<a >标记

中提取文字
import scrapy

class PriceSpider(scrapy.Spider):
    name = "annonce"  #name of spider

    def start_requests(self):
        urls = [
            'https://www.leboncoin.fr/ventes_immobilieres/offres/ile_de_france/?th=1',

        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        for annonce in response.css('section.tabsContent li').extract():
            yield{
                'title':annonce.css('a ::title').extract_first(),
                }

1 个答案:

答案 0 :(得分:0)

试一试。你的css选择器存在严重缺陷。

import scrapy

class PriceSpider(scrapy.Spider):
    name = "annonce"  #name of spider

    def start_requests(self):
        urls = [
            'https://www.leboncoin.fr/ventes_immobilieres/offres/ile_de_france/?th=1',

        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        for annonce in response.css('.list_item'):
            yield{
                'link':annonce.css('::attr(href)').extract_first(),
                'title':annonce.css('.item_title::text').extract_first().strip(),
                }

还有一件事。打开settings.py文件并进行制作:

ROBOTSTXT_OBEY = False