如何在html标记中选择所有href属性包含一个公共类。在Scrapy

时间:2018-11-27 01:34:10

标签: scrapy-spider scrapy-shell

我想选择标记中包含的所有href ... 这是我的html代码

class Scheme(models.Model):

        name = models.CharField(max_length=50, unique=True)
        programme = models.ForeignKey('Programme', on_delete=models.CASCADE, 
                    related_name='schemes')
        branch = models.ForeignKey(
                    'Branch', on_delete=models.CASCADE, 
                     related_name='schemes')

我使用了<a href="/gp/product/0545935172 ...." class="aok-block aok-nowrap" title="Dog Man: Lord of the Fleas: From the Creator of Captain Underpants (Dog Man #5)"> 但结果是:[]

2 个答案:

答案 0 :(得分:0)

建议您使用xpath表达式。例如response.xpath("//a[class='aok-block aok-nowrap']").get_attribute('href')

答案 1 :(得分:0)

添加到答案johnnydoe

将会是:

    response.xpath('*//a/@href').extract_first()
    response.xpath('*//a/@class').extract_first()
    response.xpath('*//a/@title').extract_first()

如果您只想获取href,则必须找到上方的标签...就像这样:

    <li>
    <a id="nav-questions" href="/questions">
    </li>

将会是:

    response.xpath('...some uniq selector.../li/a/@href').extract_first()