使用XPATH进行属性切片?

时间:2016-03-02 19:36:52

标签: python xml xpath scrapy

让我们假设我们有以下html片段:

...
<section>
    <a href="https://example.com" data-utag="{"sku":"12340", "abc":"Lorem ipsum"}">sometext</a>
</section>
...

使用XPATH,如何从data-utag中仅提取SKU值( 12340 )?

1 个答案:

答案 0 :(得分:1)

在html标签中使用单引号而不是双引号会使parsel / scrapy脚本正常工作:

from parsel import selector
import json

sel = selector.Selector(text=u"""<section><a href='https://example.com' data-utag='{"sku":"12340", "abc":"Lorem ipsum"}'>sometext</a></section>""")

jsontxt = sel.xpath("string(.//section/a/@data-utag)").extract()[0]

loaded = json.loads(jsontxt)

print(loaded["sku"])