让我们假设我们有以下html片段:
...
<section>
<a href="https://example.com" data-utag="{"sku":"12340", "abc":"Lorem ipsum"}">sometext</a>
</section>
...
使用XPATH,如何从data-utag中仅提取SKU值( 12340 )?
答案 0 :(得分:1)
在html标签中使用单引号而不是双引号会使parsel / scrapy脚本正常工作:
from parsel import selector
import json
sel = selector.Selector(text=u"""<section><a href='https://example.com' data-utag='{"sku":"12340", "abc":"Lorem ipsum"}'>sometext</a></section>""")
jsontxt = sel.xpath("string(.//section/a/@data-utag)").extract()[0]
loaded = json.loads(jsontxt)
print(loaded["sku"])