我有一个xml文件,其中包含以下示例部分:
<p>
<hi rend="center"><hi rend="italic">Martinsburgh, July</hi> 24.</hi>
</p>
<p> We are informed, that one day last week, a <lb/>Mr. Barret, living near the
South Branch, acci<lb break="no"/>dentally shot his wife;–he was fixing a
flint to <lb/>his gun, and incautiously dragging the trigger, not <lb/>knowing
the gun was loaded, discharged the <lb/>whole contents into her body, and she
died in a <lb/>few moments after–the unfortunate woman had <lb/>a young
child at her breast, but it providentially <lb/>received no injury. </p>
<p> Alexander M'Gillivray advertises for a tutor, <lb/>willing to instruct Indian
children in the rudiments of the English language; and the first prin<lb
break="no"/>ciples of <supplied reason="copy blur">arithmetic</supplied>. In
the advertisement, this <lb/>chief <supplied reason="copy blur">??? ??? ???
???</supplied> of the Creek nation. </p>
如果我使用这个xsl文件:
<xsl:output method="text" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="teiCorpus">
<xsl:for-each select="TEI">
<xsl:result-document method="text"
href="Individual MD Entries\{teiHeader/fileDesc/sourceDesc/biblFull/publicationStmt/date/@when}_{teiHeader/fileDesc/sourceDesc/biblFull/titleStmt/title}_{teiHeader/fileDesc/titleStmt/title}.md">
<xsl:for-each select="text/body">
<xsl:apply-templates select="p"/>
</xsl:for-each>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template match="hi[@rend='italic']">*<xsl:value-of select="."/>*</xsl:template>
<xsl:template match="p"><xsl:text>

</xsl:text><xsl:value-of select="normalize-space(.)"/></xsl:template>
</xsl:stylesheet>
我得到格式正确的段落(没有随机换行符)但是没有出现斜体。如果我用这个:
<xsl:output method="text" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="teiCorpus">
<xsl:for-each select="TEI">
<xsl:result-document method="text"
href="Individual MD Entries\{teiHeader/fileDesc/sourceDesc/biblFull/publicationStmt/date/@when}_{teiHeader/fileDesc/sourceDesc/biblFull/titleStmt/title}_{teiHeader/fileDesc/titleStmt/title}.md">
<xsl:for-each select="text/body/p">
<xsl:apply-templates />
</xsl:for-each>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template match="hi[@rend='italic']">*<xsl:value-of select="."/>*</xsl:template>
<xsl:template match="p"><xsl:text>

</xsl:text><xsl:value-of select="normalize-space(.)"/></xsl:template>
</xsl:stylesheet>
我得到了正确的斜体文字,但是我还在XML中出现了额外的换行符(为了便于阅读)。我怎样才能得到两者?
更新
使用更新的代码(normalize-space),我得到:
*Martinsburgh, July*24.
We are informed, that one day last week, aMr. Barret, living near the South Branch, accidentally shot his wife;–he was fixing a flint tohis gun, and incautiously dragging the trigger, notknowing the gun was loaded, discharged thewhole contents into her body, and she died in afew moments after–the unfortunate woman hada young child at her breast, but it providentiallyreceived no injury.
Alexander M'Gillivray advertises for a tutor,willing to instruct Indian children in the rudiments of the English language; and the first principles ofarithmetic. In the advertisement, thischief??? ??? ??? ???of the Creek nation.
我需要:
*Martinsburgh, July* 24.
We are informed, that one day last week, a Mr. Barret, living near the South Branch, accidentally shot his wife;–he was fixing a flint to his gun, and incautiously dragging the trigger, not knowing the gun was loaded, discharged the whole contents into her body, and she died in a few moments after–the unfortunate woman had a young child at her breast, but it providentially received no injury.
Alexander M'Gillivray advertises for a tutor, willing to instruct Indian children in the rudiments of the English language; and the first principles of arithmetic. In the advertisement, this chief ??? ??? ??? ??? of the Creek nation.
答案 0 :(得分:0)
我想你要替换
<xsl:template match="hi[@rend='italic']">*<xsl:value-of select="."/>*</xsl:template>
<xsl:template match="p"><xsl:text>

</xsl:text><xsl:value-of select="normalize-space(.)"/></xsl:template>
与
<xsl:template match="hi[@rend='italic']">*<xsl:apply-templates/>*</xsl:template>
<xsl:template match="p"><xsl:text>

</xsl:text><xsl:apply-templates/></xsl:template>
<xsl:template match="lb[not(@break = 'no')]"><xsl:text> </xsl:text></xsl:template>
<xsl:template match="text()">
<xsl:value-of select="replace(replace(., '^\s+|\s+$', ''), '\s+', ' ')"/>
</xsl:template>
作为替代方案,可以先使用上述基于apply-templates
的方法,但将每个p
的结果存储在变量中,然后在变量上使用normalize-space
作为最终输出
答案 1 :(得分:0)
尝试添加:
<xsl:strip-space elements="*"/>
位于样式表的顶层。
未经测试,因为您的输入不是格式良好的XML。