Question

我通过实现以下示例，成功从http://quotes.toscrape.com/示例获得了href链接：

response.css('div.quote > span > a::attr(href)').extract()

，它为每个href内的所有部分链接提供了一个标记：

['/author/Albert-Einstein', '/author/J-K-Rowling', '/author/Albert-Einstein', '/author/Jane-Austen', '/author/Marilyn-Monroe', '/author/Albert-Einstein', '/author/Andre-Gide', '/author/Thomas-A-Edison', '/author/Eleanor-Roosevelt', '/author/Steve-Martin']

在上面的示例中，每个标签都具有以下格式：

<a href="/author/Albert-Einstein">(about)</a>

因此，我尝试对此网站进行以下操作：http://www.thegoodscentscompany.com/allproc-1.html 这里的问题是标记的样式有点不同：

<a href="#" onclick="openMainWindow('http://www.thegoodscentscompany.com/data/rw1247381.html');return false;">formaldehyde</a>

如您所见，使用上述类似方法无法从href获得链接。我想从该标签获取链接（http://www.thegoodscentscompany.com/data/rw1247381.html），但我做不到。我如何获得此链接？

Answer 1

尝试一下response.css('a::attr(onclick)').re(r"Window\('(.*?)'\)")

如何从标签中获取href链接？

1 个答案: