这是我要从中获取pub_date
的元标记的示例:
<meta name="parsely-page" content='{"title":"Article title","link":"https:\/\/site.com\/category\/article","type":"post","section":"category","image_url":"","author":null,"pub_date":"2009-03-01T14:17:14+00:00","post_id":"article_6463676334","tags":[]}' />
获取全部内容的xpath将是:
//meta[@name="parsely-author"]/@content
是否可以使用xpath获取dict键的值?
答案 0 :(得分:0)
使用XPath 3.1,您可以
//meta[@name="parsely-author"]/parse-json(@content)?pub-date
可悲的是,很可能您正在使用仅支持XPath 1.0的XPath处理器,在这种情况下,除非找到其他处理器,否则您将无法使用此处理器。
答案 1 :(得分:0)
使用XSLT 1.0 :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vQ">"</xsl:variable>
<xsl:template match="/">
<xsl:value-of select=
'substring-before(substring-after(//meta[@name="parsely-page"]/@content,
concat($vQ, "pub_date", $vQ, ":", $vQ)), $vQ)'/>
</xsl:template>
</xsl:stylesheet>
在此XML文档上执行此转换(您的元标记):
<meta name="parsely-page"
content='{"title":"Article title","link":"https:\/\/site.com\/category\/article","type":"post","section":"category","image_url":"","author":null,"pub_date":"2009-03-01T14:17:14+00:00","post_id":"article_6463676334","tags":[]}' />
产生了想要的结果:
2009-03-01T14:17:14 + 00:00
我们可以编写一个计算为所需字符串的XPath 1.0表达式,但是,如果不进行转义,则必须对引号和撇号进行转义以避免嵌套错误。
substring-before(substring-after(//meta[@name="parsely-page"]/@content,
'"pub_date":"'),
'"')
使用XSLT 1.0进行验证:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vQ">"</xsl:variable>
<xsl:template match="/">
<xsl:value-of select=
'substring-before(substring-after(//meta[@name="parsely-page"]/@content,
'"pub_date":"'),
'"')'/>
</xsl:template>
</xsl:stylesheet>
将此转换应用于相同的XML文档(如上所述)时,它将评估单个XPath 1.0表达式并输出所需的正确结果:
2009-03-01T14:17:14 + 00:00