我有这个XSLT,Marting Honnen慷慨地提供(link)
模板由于某种原因停止工作,我似乎无法修复它。数据扩大了,但我不知道这应该如何重要。
而不是将双管分隔文本转换为xml;它只删除分隔数据
这是模板和示例数据:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="str">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:analyze-string select="." regex="\|((\|[^|]+\|)+)\|">
<xsl:matching-substring>
<xsl:analyze-string select="regex-group(1)" regex="\|(\w+):([^|]+)\|">
<xsl:matching-substring>
<xsl:element name="{regex-group(1)}">
<xsl:value-of select="regex-group(2)"/>
</xsl:element>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
之前的数据:(注意数据已经扩展)
<doc>
<arr name="content">
<str> stream_source_info docname stream_content_type text/html stream_size 412 Content-Encoding ISO-8859-1 stream_name docname Content-Type text/html; charset=ISO-8859-1 resourceName docname ||phone:3282||email:Lori.KS@.edu||officenumber:D-107A||vcard:https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b||photo:https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&lastMod=1348674215846||pronunciation:https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&lastMod=1348674215846|| background:n/a || experience:n/a || divisiongroup:11-80 || groupdesc:TII || tags:n/a || </str>
</arr>
</doc>
我可以使用XSLT将此XML转换为此吗?
<doc>
<arr name="content">
<str> stream_source_info docname stream_content_type text/html stream_size 412 Content-Encoding ISO-8859-1 stream_name docname Content-Type text/html; charset=ISO-8859-1 resourceName docname
<phone>3282</phone>
<email>Lori.KS@.edu</email>
<officenumber>D-107A</officenumber>
<vcard>https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b</vcard>
<photo>https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&lastMod=1348674215846</photo>
<pronunciation>https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&lastMod=1348674215846</pronunciation>
<background> ...
</str>
</arr>
</doc>
答案 0 :(得分:1)
一些冒号分隔的字段具有前导和尾随空格(例如| background:n/a |
),因此正则表达式需要进行一些调整:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="str">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:analyze-string select="." regex="\|((\|\s*[^|]+\s*\|)+)\|">
<xsl:matching-substring>
<xsl:analyze-string select="regex-group(1)" regex="\|\s*(\w+):([^|]+?)\s*\|">
<xsl:matching-substring>
<xsl:element name="{regex-group(1)}">
<xsl:value-of select="regex-group(2)"/>
</xsl:element>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
在发布的输入Saxon 9.5输出中使用该样式表
<doc>
<arr name="content">
<str> stream_source_info docname stream_content_type text/html stream_size 412 Content-Encoding ISO-8859-1
stream_name docname Content-Type text/html; charset=ISO-8859-1 resourceName docname <phone>3282</phone>
<email>Lori.KS@.edu</email>
<officenumber>D-107A</officenumber>
<vcard>https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b</vcard>
<photo>https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&lastMod=1348674215846</photo>
<pronunciation>https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&lastMod=1348674215846</pronunciation>
<background>n/a</background>
<experience>n/a</experience>
<divisiongroup>11-80</divisiongroup>
<groupdesc>TII</groupdesc>
<tags>n/a</tags>
</str>
</arr>
</doc>