我正在尝试找出在www.booking.com酒店列表中单击下一页按钮并继续运行蜘蛛的最佳方法。
当检查按钮时:
<li class="nextpage"
a href="/bigcity/offset=15"class=gotopage_2"
</li>
单页工作代码:
import scrapy
from ..items import BookItem
class BookSpiderSpider(scrapy.Spider):
name = "book_spider"
start_urls = (
'https://www.booking.com/smallcity/offset=10',
)
def parse(self, response) :
items = BookItem()
title_name = response.css('span.sr-hotel__name::text').extract()
items['title_name'] = title_name
yield items
每次单击按钮时,h href和class都会更改
因此,我猜测python代码应该找到该按钮,然后采用不同的href替换为现有的url并转到
答案 0 :(得分:0)
您好,请在您的应用程序中使用此代码段
[{name:1,6:'',7:'',8:'',9:''},{name:2,6:'',7:'',8:'',9:''},{name:3,6:'',7:'',8:'',9:''},{name:4,6:'',7:'',8:'',9:''},{name:5,6:'',7:'',8:'',9:''}]
答案 1 :(得分:0)
用户.urljoin,以避免任何URL模式问题:
next_page_url = response.urljoin( next_href )