应用错误收集

我在spiders / spidername.py中有以下代码：

# python 3
import scrapy
from urllib.parse import urljoin


class PycoderSpider(scrapy.Spider):
    name = "auru"
    start_urls = [
        'http://example.com',
    ]

    def parse(self, response):
        for post_link in response.xpath(
                '//div[@class="post mb-2"]/h2/a/@href').extract():
            url = urljoin(response.url, post_link)
            print(url)

我需要更改（我不熟悉Python）从url site.com获取来自div.className的内容根据以下掩码 - site.com/id其中id等于101的数字到100101，如果它存在？

Scrapy。如何在Scrappy中通过掩码解析url中的特定div.class？

0 个答案: