我遇到的问题与Remove word proofing errors from WordML and merge the nodes相同-@Rupesh_Kr为此提供了建议的xsl模板。我如何使用它(信誉不足,无法在那里询问)?我希望它能删除在文档中折行的Microsoft Word XML标记proofErr w:type =“ spellStart”和w:type =“ spellEnd”。我目前使用XSL通过添加回车符来产生更多可区分的结果,因此我尝试将其替换为他,并使用命令“ msxsl.exe -xe procedure.xml xml.xsl”,其中xml.xsl包含了他的建议,但得到了出现以下错误:
Code: 0xc00ce01d
URL: file:///xml.xsl
Line: 17
Column: 12
Reference to undeclared namespace prefix: 'w'.
xml.xsl包含他的建议,如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!--
! This is an XML to XML transformation intended to be imported into a host
! XSLT. The source .xml file is copied verbatim by default.
! The importing XSL Transform should specify xsl:output as xml, and should
! contain templates to override the node and attribute match made here so
! that it can transform specific portions of the original XML file.
-->
<xsl:output method="xml" encoding="utf-8" indent="yes" />
<!-- ========================================================================
-->
<xsl:template match="w:p[w:proofErr]/w:r[1]">
<w:r>
<w:t>
<xsl:value-of select=".."/>
</w:t>
</w:r>
</xsl:template>
<xsl:template match="w:p[w:proofErr]/w:r[position() > 1]"/>
</xsl:stylesheet>
下面是一个示例输入文件,通过删除许多MS Word定义进行了简化:
<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core">
<w:ignoreSubtree w:val="http://schemas.microsoft.com/office/word/2003/wordml/sp2"></w:ignoreSubtree>
<o:DocumentProperties>
<o:Lines>1</o:Lines>
</o:DocumentProperties>
<w:fonts>
</w:fonts>
<w:body>
<wx:sect>
<w:p>
<w:pPr>
<w:pStyle w:val="BodyText"></w:pStyle>
</w:pPr>
<w:proofErr w:type="spellStart"></w:proofErr>
<w:r>
<w:t>Hellow</w:t>
</w:r>
<w:proofErr w:type="spellEnd"></w:proofErr>
<w:r>
<w:t> </w:t>
</w:r>
<w:proofErr w:type="spellStart"></w:proofErr>
<w:r>
<w:t>world!</w:t>
</w:r>
<w:proofErr w:type="spellEnd"></w:proofErr>
</w:p>
<w:sectPr>
<w:ftr w:type="odd">
</w:ftr>
</w:sectPr>
</wx:sect>
</w:body>
</w:wordDocument>
所需的输出为:
<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core">
<w:ignoreSubtree w:val="http://schemas.microsoft.com/office/word/2003/wordml/sp2"></w:ignoreSubtree>
<o:DocumentProperties>
<o:Lines>1</o:Lines>
</o:DocumentProperties>
<w:fonts>
</w:fonts>
<w:body>
<wx:sect>
<w:p>
<w:pPr>
<w:pStyle w:val="BodyText"></w:pStyle>
</w:pPr>
<w:r>
<w:t>Hellow world!</w:t>
</w:r>
</w:p>
<w:sectPr>
<w:ftr w:type="odd">
</w:ftr>
</w:sectPr>
</wx:sect>
</w:body>
</w:wordDocument>
答案 0 :(得分:0)
首先,删除w:proofErr
节点很简单:您只需要添加一个与它们匹配的空模板即可。
<xsl:template match="w:proofErr"/>
将所有文本组合到单个w:t
节点中的另一个问题并不那么琐碎。我要执行的操作将适用于给定的示例,但可能会对其他文档产生意外的结果-特别是具有多个段落的文档(每个段落的所有文本都会单独组合)。
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:preserve-space elements="w:t"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- copy all text into the first w:r node -->
<xsl:template match="w:r[1]">
<xsl:copy>
<w:t>
<xsl:for-each select="../w:r">
<xsl:value-of select="w:t"/>
</xsl:for-each>
</w:t>
</xsl:copy>
</xsl:template>
<!-- remove other w:r nodes -->
<xsl:template match="w:r[position() > 1]"/>
<!-- remove w:proofErr nodes -->
<xsl:template match="w:proofErr"/>
</xsl:stylesheet>
在您的输入示例中,结果将为:
<?xml version="1.0" encoding="UTF-8"?>
<w:wordDocument xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core" w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve">
<w:ignoreSubtree w:val="http://schemas.microsoft.com/office/word/2003/wordml/sp2"/>
<o:DocumentProperties>
<o:Lines>1</o:Lines>
</o:DocumentProperties>
<w:fonts/>
<w:body>
<wx:sect>
<w:p>
<w:pPr>
<w:pStyle w:val="BodyText"/>
</w:pPr>
<w:r>
<w:t>Hellow world!</w:t>
</w:r>
</w:p>
<w:sectPr>
<w:ftr w:type="odd"/>
</w:sectPr>
</wx:sect>
</w:body>
</w:wordDocument>