我正在尝试运行以下内容,即goto flipkart,抓取所有产品链接并提取产品,价格和说明。但是,这仅抓取一页,我想在所有页面上重复抓取,例如,第1、2、3 ...等
GOTO flipkart.com/search?q=laptops&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off
CRAWL //div[2]/div[2]/div[1]/div//div[1]/a[@class="_2cLu-l"][1]
EXTRACT {
"product": "//span[@class=\"_35KyD6\"][1]",
"price": "//div[@class=\"_1vC4OE _3qQ9m1\"][1]",
"description": "//div[@class=\"_3u-uqB\"][1]"
}
答案 0 :(得分:1)
您需要在分页符前加上[[xpath_for_nextpage_element]]. In this case the xpath for the "next page" link is
// nav / a [11] / span . You wrap
[[{and
]] around it and put it right after the
CRAWL`语句。这样我们得到:[[// nav / a [11] / span]]
GOTO flipkart.com/search?q=laptops&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off
CRAWL [[//nav/a[11]/span]] //div[2]/div[2]/div[1]/div//div[1]/a[@class="_2cLu-l"][1]
EXTRACT {
"product": "//span[@class=\"_35KyD6\"][1]",
"price": "//div[@class=\"_1vC4OE _3qQ9m1\"][1]",
"description": "//div[@class=\"_3u-uqB\"][1]"
}
现在基本上是刮板,它将抓取所有产品信息。