分页问题网络抓取Google Chrome扩展程序

时间:2019-12-02 15:35:02

标签: web-scraping pagination

我正在尝试从杂货店的网上商店抓取产品详细信息。共有797页。但是,我不知道如何使用这种分页设置起始网址。我正在使用网络抓取工具-谷歌浏览器扩展程序。

网址:https://www.colruyt.be/nl/producten

我要抓取的第二页是https://www.colruyt.be/nl/producten?page=2

最后一页是https://www.colruyt.be/nl/producten?page=797

有人可以告诉我如何组成起始网址,以便我可以抓取所有页面吗?

站点地图

{
  "_id": "colruyt",
  "startUrl": [
    "https://www.colruyt.be/nl/producten"
  ],
  "selectors": [
    {
      "id": "Assortiment",
      "type": "SelectorElement",
      "parentSelectors": [
        "_root"
      ],
      "selector": "a.card",
      "multiple": true,
      "delay": 0
    },
    {
      "id": "Image",
      "type": "SelectorImage",
      "parentSelectors": [
        "Assortiment"
      ],
      "selector": ".card__image img",
      "multiple": false,
      "delay": 0
    },
    {
      "id": "Product",
      "type": "SelectorText",
      "parentSelectors": [
        "Assortiment"
      ],
      "selector": "p.card__text",
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "Hoeveelheid",
      "type": "SelectorText",
      "parentSelectors": [
        "Assortiment"
      ],
      "selector": "p.card__quantity",
      "multiple": false,
      "regex": "",
      "delay": 0
    }
  ]
}

0 个答案:

没有答案