Question

我有这个目标网址：

<nav>
<ul class="pagination pagination-lg">
<li class="active" itemprop="pageStart">
<a href="moto.html">1</a></li>
<li itemprop="pageEnd">
<a href="moto-2.html">2</a></li>
<li>
<a href="moto-2.html" aria-label="Next" class="xh-highlight">
    <span aria-hidden="true">»</span></a>
</li><
</ul>
</nav>

但是我无法选择下一页链接，我尝试使用：

    next_page_url = response.xpath('./div/div/div[1]/nav/ul/li[3]/a').extract_first()

也可以

response.css('[class="xh-highlight"]').extract()

我只能得到[]在外壳上的结果

其他要点：我将用户代理设置为谷歌浏览器，因为我在这里阅读了其他有关标记重音符号问题的用户，但没有解决我的问题

Answer 1

我想警告您，Scrapy无法抓取使用javascript渲染的网站。如果页面是用javascript呈现的，请考虑使用Selenuim这样的网络驱动程序。

我建议您转到scrapy shell，然后键入view（response）。如果看到空白页，则该页面不是用javascript呈现的。

这是您从xpath获取url的方式，但是我怀疑如果看不到任何对象，这会有所不同

next_page_url = response.xpath('nav/ul/li[3]/a/text()')

如何跟随分页刮scrap

1 个答案: