从WordML移除单词校对错误

时间:2018-12-05 18:57:24

标签: xml xslt-1.0

我遇到的问题与Remove word proofing errors from WordML and merge the nodes相同-@Rupesh_Kr为此提供了建议的xsl模板。我如何使用它(信誉不足,无法在那里询问)?我希望它能删除在文档中折行的Microsoft Word XML标记proofErr w:type =“ spellStart”和w:type =“ spellEnd”。我目前使用XSL通过添加回车符来产生更多可区分的结果,因此我尝试将其替换为他,并使用命令“ msxsl.exe -xe procedure.xml xml.xsl”,其中xml.xsl包含了他的建议,但得到了出现以下错误:

Code:   0xc00ce01d
URL:    file:///xml.xsl
Line:   17
Column: 12
Reference to undeclared namespace prefix: 'w'.

xml.xsl包含他的建议,如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!--
   ! This is an XML to XML transformation intended to be imported into a host
   ! XSLT.  The source .xml file is copied verbatim by default.
   ! The importing XSL Transform should specify xsl:output as xml, and should
   ! contain templates to override the node and attribute match made here so
   ! that it can transform specific portions of the original XML file.
    -->

  <xsl:output method="xml" encoding="utf-8" indent="yes" />

  <!-- ========================================================================
    -->
  <xsl:template match="w:p[w:proofErr]/w:r[1]">
      <w:r>
          <w:t>
          <xsl:value-of select=".."/>
          </w:t>
      </w:r>
  </xsl:template>

  <xsl:template match="w:p[w:proofErr]/w:r[position() > 1]"/>

</xsl:stylesheet>

下面是一个示例输入文件,通过删除许多MS Word定义进行了简化:

<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core">
<w:ignoreSubtree w:val="http://schemas.microsoft.com/office/word/2003/wordml/sp2"></w:ignoreSubtree>
<o:DocumentProperties>
<o:Lines>1</o:Lines>
</o:DocumentProperties>
<w:fonts>
</w:fonts>
<w:body>
<wx:sect>
<w:p>
<w:pPr>
<w:pStyle w:val="BodyText"></w:pStyle>
</w:pPr>
<w:proofErr w:type="spellStart"></w:proofErr>
<w:r>
<w:t>Hellow</w:t>
</w:r>
<w:proofErr w:type="spellEnd"></w:proofErr>
<w:r>
<w:t> </w:t>
</w:r>
<w:proofErr w:type="spellStart"></w:proofErr>
<w:r>
<w:t>world!</w:t>
</w:r>
<w:proofErr w:type="spellEnd"></w:proofErr>
</w:p>
<w:sectPr>
<w:ftr w:type="odd">
</w:ftr>
</w:sectPr>
</wx:sect>
</w:body>
</w:wordDocument>

所需的输出为:

<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core">
<w:ignoreSubtree w:val="http://schemas.microsoft.com/office/word/2003/wordml/sp2"></w:ignoreSubtree>
<o:DocumentProperties>
<o:Lines>1</o:Lines>
</o:DocumentProperties>
<w:fonts>
</w:fonts>
<w:body>
<wx:sect>
<w:p>
<w:pPr>
<w:pStyle w:val="BodyText"></w:pStyle>
</w:pPr>
<w:r>
<w:t>Hellow world!</w:t>
</w:r>
</w:p>
<w:sectPr>
<w:ftr w:type="odd">
</w:ftr>
</w:sectPr>
</wx:sect>
</w:body>
</w:wordDocument>

1 个答案:

答案 0 :(得分:0)

首先,删除w:proofErr节点很简单:您只需要添加一个与它们匹配的空模板即可。

<xsl:template match="w:proofErr"/>

将所有文本组合到单个w:t节点中的另一个问题并不那么琐碎。我要执行的操作将适用于给定的示例,但可能会对其他文档产生意外的结果-特别是具有多个段落的文档(每个段落的所有文本都会单独组合)。

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:preserve-space elements="w:t"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<!-- copy all text into the first w:r node -->
<xsl:template match="w:r[1]">
    <xsl:copy>
        <w:t>
            <xsl:for-each select="../w:r">
                <xsl:value-of select="w:t"/>
            </xsl:for-each>
        </w:t>
    </xsl:copy>
</xsl:template>

<!-- remove other w:r nodes -->
<xsl:template match="w:r[position() > 1]"/>

<!-- remove w:proofErr nodes -->
<xsl:template match="w:proofErr"/>

</xsl:stylesheet>

在您的输入示例中,结果将为:

<?xml version="1.0" encoding="UTF-8"?>
<w:wordDocument xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core" w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve">
  <w:ignoreSubtree w:val="http://schemas.microsoft.com/office/word/2003/wordml/sp2"/>
  <o:DocumentProperties>
    <o:Lines>1</o:Lines>
  </o:DocumentProperties>
  <w:fonts/>
  <w:body>
    <wx:sect>
      <w:p>
        <w:pPr>
          <w:pStyle w:val="BodyText"/>
        </w:pPr>
        <w:r>
          <w:t>Hellow world!</w:t>
        </w:r>
      </w:p>
      <w:sectPr>
        <w:ftr w:type="odd"/>
      </w:sectPr>
    </wx:sect>
  </w:body>
</w:wordDocument>