我正处于将基于Word的文档转换为XML的非常痛苦的过程中。我遇到了以下问题:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">Is this a
quote</hi>?” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is a
quote</hi>” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is
definitely a quote</hi>!” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text.„<hi rend="italics">This is a
first quote</hi>” (Source). „<hi rend="italics">Sometimes there is a second quote as
well</hi>!?” (Source). </p>
</root>
<p>
个节点包含混合内容。 <element>
我在之前的迭代中已经处理过了。但现在问题在于引号和来源部分出现在<hi rend= "italics"/>
内,部分出现在文本节点中。
如何使用XSLT 2.0:
<hi rend="italics">
个节点?<hi rend="italics">
的内容输出为<quote>...</quote>
,删除引号(“”“和”“”),但在<quote/>
内包含任何出现的问号和感叹号紧跟在<hi rend="italics">
的兄弟姐妹之后?<hi rend="italics">
节点后面的“(”和“)”之间的文本节点转换为<source>...</source>
,而不是括号。换句话说,我的输出应该如下所示:
<root>
<p>
<element>This one is taken care of.</element> Some more text. <quote>Is this a quote?</quote> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is a quote</hi> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is definitely a quote!</hi> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is a first quote</quote> <source>Source</source>. <quote>Sometimes there is a second quote as well!?</quote> <source>Source</source>.
</p>
</root>
我从来没有像这样处理混合内容和字符串操作,整个事情真的让我失望。我将非常感谢您的提示。
答案 0 :(得分:2)
此转化:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"hi[@rend='italics'
and
preceding-sibling::node()[1][self::text()[ends-with(., '„')]]
]">
<quote>
<xsl:value-of select=
"concat(.,
if(matches(following-sibling::text()[1], '^[?!]+'))
then replace(following-sibling::text()[1], '^([?!]+).*$', '$1')
else()
)
"/>
</quote>
</xsl:template>
<xsl:template match="text()[true()]">
<xsl:variable name="vThis" select="."/>
<xsl:variable name="vThis2" select="translate($vThis, '„”?!', '')"/>
<xsl:value-of select="substring-before(concat($vThis2, '('), '(')"/>
<xsl:if test="contains($vThis2, '(')">
<source>
<xsl:value-of select=
"substring-before(substring-after($vThis2, '('), ')')"/>
</source>
<xsl:value-of select="substring-after($vThis2, ')')"/>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
应用于提供的XML文档:
<root>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">Is this a
quote</hi>?” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is a
quote</hi>” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is
definitely a quote</hi>!” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text.„<hi rend="italics">This is a
first quote</hi>” (Source). „<hi rend="italics">Sometimes there is a second quote as
well</hi>!?” (Source). </p>
</root>
生成想要的正确结果:
<root>
<p>
<element>This one is taken care of.</element> Some more text. <quote>Is this a
quote?</quote> <source>Source</source>. </p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is a
quote</quote> <source>Source</source>. </p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is
definitely a quote!</quote> <source>Source</source>. </p>
<p>
<element>This one is taken care of.</element> Some more text.<quote>This is a
first quote</quote> <source>Source</source>. <quote>Sometimes there is a second quote as
well!?</quote> <source>Source</source>. </p>
</root>
答案 1 :(得分:1)
这是另一种解决方案。它允许更具叙述性的输入文档(引号内的引号,一个文本节点内的多个(源)片段,'“'作为未跟随hi元素的数据)。
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:so="http://stackoverflow.com/questions/12690177"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xsl xs so">
<xsl:output omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />
<xsl:template match="@*|comment()|processing-instruction()">
<xsl:copy />
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<xsl:function name="so:clip-start" as="xs:string">
<xsl:param name="in-text" as="xs:string" />
<xsl:value-of select="substring($in-text,1,string-length($in-text)-1)" />
</xsl:function>
<xsl:function name="so:clip-end" as="xs:string">
<xsl:param name="in-text" as="xs:string" />
<xsl:value-of select="substring-after($in-text,'”')" />
</xsl:function>
<xsl:function name="so:matches-start" as="xs:boolean">
<xsl:param name="text-node" as="text()" />
<xsl:value-of select="$text-node/following-sibling::node()/self::hi[@rend='italics'] and
ends-with($text-node, '„')" />
</xsl:function>
<xsl:template match="text()[so:matches-start(.)]" priority="2">
<xsl:call-template name="parse-text">
<xsl:with-param name="text" select="so:clip-start(.)" />
</xsl:call-template>
</xsl:template>
<xsl:function name="so:matches-end" as="xs:boolean">
<xsl:param name="text-node" as="text()" />
<xsl:value-of select="$text-node/preceding-sibling::node()/self::hi[@rend='italics'] and
matches($text-node,'^[!?]*”')" />
</xsl:function>
<xsl:template match="text()[so:matches-end(.)]" priority="2">
<xsl:call-template name="parse-text">
<xsl:with-param name="text" select="so:clip-end(.)" />
</xsl:call-template>
</xsl:template>
<xsl:template match="text()[so:matches-start(.)][so:matches-end(.)]" priority="3">
<xsl:call-template name="parse-text">
<xsl:with-param name="text" select="so:clip-end(so:clip-start(.))" />
</xsl:call-template>
</xsl:template>
<xsl:template match="text()" name="parse-text" priority="1">
<xsl:param name="text" select="." />
<xsl:analyze-string select="$text" regex="\(([^)]*)\)">
<xsl:matching-substring>
<source>
<xsl:value-of select="regex-group(1)" />
</source>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="hi[@rend='italics']">
<quote>
<xsl:apply-templates select="(@* except @rend) | node()" />
<xsl:for-each select="following-sibling::node()[1]/self::text()[matches(.,'^[!?]')]">
<xsl:value-of select="replace(., '^([!?]+).*$', '$1')" />
</xsl:for-each>
</quote>
</xsl:template>
</xsl:stylesheet>