Question

有没有办法可以获得完整的href属性（https://studyacer.com/question/audit-and-assurance-services-444592）而不是部分href？（https://studyacer.com/question/audit-and-）来自此标记？

<td class="word-break">
    <span class="label label-success">Due in 5 days</span>
    <a href="https://studyacer.com/question/hey-greg-here-is-my-hrm522-discussion-444593">
        <strong>hey Greg here is my HRM522 discussion</strong></a>
    <small>&quot;Auditing of Organizational Ethics and Compliance Programs&quot;  Please respond to the following:...
    </small>
    <br />
    <strong>Business > Management</strong>
</td>

我拥有的XPath表达式是'// td [@ class =“word-break”] / a / @ href'，它只是给了我一个部分网址。该网站使用绝对网址（如果这有帮助）。

编辑：我正在使用Scrapy来实现基本的抓取工具。我跑的时候

response.xpath('//td[@class="word-break"]/a/@href')

我得到了部分网址。

Answer 1

对于有类似问题的人。结果发现正在运行

response.xpath('xpath_expression')

为您提供Scrapy中的部分网址。特别是如果网址很长。对于完整值，最后使用extract（）。喜欢这个

response.xpath('xpath_expression').extract()

解密此XPath表达式以获取完整的href属性

1 个答案: