我正在尝试从杂货店的网上商店抓取产品详细信息。共有797页。但是,我不知道如何使用这种分页设置起始网址。我正在使用网络抓取工具-谷歌浏览器扩展程序。
网址:https://www.colruyt.be/nl/producten
我要抓取的第二页是https://www.colruyt.be/nl/producten?page=2
最后一页是https://www.colruyt.be/nl/producten?page=797
有人可以告诉我如何组成起始网址,以便我可以抓取所有页面吗?
站点地图
{
"_id": "colruyt",
"startUrl": [
"https://www.colruyt.be/nl/producten"
],
"selectors": [
{
"id": "Assortiment",
"type": "SelectorElement",
"parentSelectors": [
"_root"
],
"selector": "a.card",
"multiple": true,
"delay": 0
},
{
"id": "Image",
"type": "SelectorImage",
"parentSelectors": [
"Assortiment"
],
"selector": ".card__image img",
"multiple": false,
"delay": 0
},
{
"id": "Product",
"type": "SelectorText",
"parentSelectors": [
"Assortiment"
],
"selector": "p.card__text",
"multiple": false,
"regex": "",
"delay": 0
},
{
"id": "Hoeveelheid",
"type": "SelectorText",
"parentSelectors": [
"Assortiment"
],
"selector": "p.card__quantity",
"multiple": false,
"regex": "",
"delay": 0
}
]
}