Question

当我使用xpath从另一个选择器创建一个选择器时，创建的选择器仍然包含完整的原始选择器内容。例如参见

original_document = """
    <a>
    <b>
        <c>hello_1</c>
    </b>
    <c>hello_2</c>
    </a>
"""
document_sel = scrapy.Selector(text = original_document)
second_sel = document_sel.xpath('//b')

second_sel已从原始文档中正确提取并作为子集：

print second_sel.extract()
[u'<b>\n        <c>hello_1</c>\n    </b>']

但是当我尝试从second_sel中提取时：

print second_sel.xpath('//c').extract()
[u'<c>hello_1</c>', u'<c>hello_2</c>']

为什么要提取“ hello_2”？

Answer 1

基于Scrapy文档：“使用相对XPath”中的https://doc.scrapy.org/en/latest/topics/selectors.html，必须使用“ .//”来获取它。

print second_sel.xpath('.//c').extract()
[u'<c>hello_1</c>']

从另一个选择器创建一个选择器

1 个答案: