Question

我开始将Scrapy用于一个小项目，但我无法提取链接。每次找到班级时，我只获得“[]”而不是网址。我错过了一些明显的东西吗？

sel = Selector(response)
for entry in sel.xpath("//div[@class='recipe-description']"):
    print entry.xpath('href').extract()

网站上的示例：

<div class="recipe-description">
    <a href="http://www.url.com/">
        <h2 class="rows-2"><span>SomeText</span></h2>
    </a>
</div>

Answer 1

您的xpath查询错误

for entry in sel.xpath("//div[@class='recipe-description']"):

在这一行中你实际上正在迭代我们没有任何Href属性的div

要使其正确，您应该在achor中选择div个元素：

for entry in sel.xpath("//div[@class='recipe-description']/a"):
    print entry.xpath('href').extract()

最佳解决方案是直接在href循环中提取for属性

for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():
    print href

为简单起见，您还可以使用css选择器

for href in sel.css("div.recipe-description a::attr(href)").extract():
    print href