HTML:
<td>
<p>China's Changing Trade Structure and its Implications
<br>
Kevin Chow, Xiao Hong, John Fu and Sylvia Li
</p>
<p>25 August 2017
<br>
<a href="/media/eng/publication-and-research/research/research-memorandums/2017/RM13-2017.pdf" target="_blank">Full Paper</a>
(PDF File, 465KB)
</p>
</td>
我已经获得了图片中显示的“a”并试图获得标题:“中国不断变化的贸易结构及其影响”和日期:“2017年8月25日”分别使用“a”的相对路径。但我无法得到它们。这是代码:
for a in response.xpath('//div[@class="prContent"]//a[@href]'):
url = response.urljoin(a.xpath('@href').extract_first())
title = extract_text(a.xpath('../../p[1]/text()[1]'))
答案 0 :(得分:1)
您可以尝试使用以下表达式来获取所需的输出:
获取"China's Changing Trade Structure and its Implications"
:
../../p[1]/text()[1]
获取"25 August 2017"
:
../../p[2]/text()[1]
P.S。只有在您正确定义了链接(a
)