Question

HTML结构如下：

<td class='hey'> 
<a href="https://example.com">First one</a>
</td>

这是我的选择器：

m_URL = sel.css("td.hey a:nth-child(1)[href] ").extract()

我的选择器现在将输出<a href="https://example.com">First one</a>，但我只希望它输出链接本身：https://example.com。

我该怎么做？

Answer 1

从::attr(value)标记获取a。

演示（使用Scrapy shell）：

$ scrapy shell index.html
>>> response.css('td.hey a:nth-child(1)::attr(href)').extract()
[u'https://example.com']

index.html包含：

<table>
    <tr>
        <td class='hey'>
            <a href="https://example.com">Fist one</a>
        </td>
    </tr>
</table>

Answer 2

你可以试试这个：

m_URL = sel.css("td.hey a:nth-child(1)").xpath('@href').extract()