OpenXML表到csv

时间:2014-08-11 08:09:48

标签: xslt openxml

我正在尝试使用xsltproc从OpenXML中提取表。 有人可以帮助我以下(我的愿望清单) 1.如何过滤非表格文本 2.格式化表格(可能是csv) 3.将结果导入多个输出文件(每个文件中有一个表),版本= 1.0

我的尝试:

 <?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
 <xsl:output method="text"/>
 <xsl:template match="w:tbl">
 <xsl:apply-templates/><xsl:for-each select="w:tr"><xsl:text>
 </xsl:text>
  </xsl:for-each>
 <xsl:apply-templates/><xsl:for-each select="w:tcW"><xsl:text>  ;</xsl:text>
  </xsl:for-each>
 </xsl:template>
 </xsl:stylesheet>

这会产生一个输出:

>  xsltproc new2 word/document.xml
Node Selection and Pattern MatchingIn XSLT stylesheets, template rules for node selection and pattern matching are applied via the select attribute of the xsl:apply-templates command and the match attribute of the xsl:template element, respectively. A specification can be created to determine how to resolve issues in the event that a multiple number of applicable template rules exist, or alternately, when there are no applicable template rules at all.Table1 headingTextbodyBlahblahBody2Blah2Blah2


Table1 headingTextbodyBlahblahBody2Blah2Blah2Node SelectionWith the select attribute of xsl:apply-templates command, an XPath description can be used to either (1) select a multiple number of nodes with identical names, or (2) select a multiple number of nodes with differing names. Under scenario (1), using XPath to designate "ProductList/ Product" results in the selection of two Product element nodes.Table1 headingCol1TextbodyBlahCol1 BlahblahBody2Blah2Col1 Blah2Blah2Body3Blah3Col1 Blah3Blah3



Table1 headingCol1TextbodyBlahCol1 BlahblahBody2Blah2Col1 Blah2Blah2Body3Blah3Col1 Blah3Blah3

预期产出:

Table1  ;heading    ;Text
body    ;Blah   ;blah
Body2   ;Blah2  ;Blah2

Table1  ;heading    ;Col1   ;Text
body    ;Blah   ;Col1 Blah  ;blah
Body2   ;Blah2  ;Col1 Blah2 ;Blah2
Body3   ;Blah3  ;Col1 Blah3 ;Blah3

我已成功完成的原始尝试形式如下,但未达到上述2/3的目标。

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<xsl:output method="text"/>

<xsl:template match="w:tr">
 <xsl:apply-templates/><xsl:text>
</xsl:text>
</xsl:template>

<xsl:template match="w:tcW">
 <xsl:apply-templates/><xsl:text>   ;</xsl:text>
</xsl:template>

<xsl:template match="w:p">
<xsl:apply-templates/><xsl:if test="position()!=last()"><xsl:text>
</xsl:text></xsl:if>
</xsl:template>

</xsl:stylesheet>

我的输入XML:

   <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14"><w:body><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Node Selection and Pattern Matching</w:t></w:r></w:p><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>In XSLT stylesheets, template rules for node selection and pattern matching are applied via the select attribute of the xsl:apply-templates command and the match attribute of the xsl:template element, respectively. A specification can be created to determine how to resolve issues in the event that a multiple number of applicable template rules exist, or alternately, when there are no applicable template rules at all.</w:t></w:r></w:p><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"/><w:tbl><w:tblPr><w:tblStyle w:val="TableGrid"/><w:tblW w:w="0" w:type="auto"/><w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/></w:tblPr><w:tblGrid><w:gridCol w:w="3116"/><w:gridCol w:w="3117"/><w:gridCol w:w="3117"/></w:tblGrid><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="3116" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t xml:space="preserve">Table1 </w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>heading</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Text</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="3116" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>body</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Blah</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>blah</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="3116" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Body2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Blah2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Blah2</w:t></w:r></w:p></w:tc></w:tr></w:tbl><w:p w:rsidR="006C4C5A" w:rsidRDefault="006C4C5A"/><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Node Selection</w:t></w:r></w:p><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>With the select attribute of xsl:apply-templates command, an XPath description can be used to either (1) select a multiple number of nodes with identical names, or (2) select a multiple number of nodes with differing names. Under scenario (1), using XPath to designate "ProductList/ Product" results in the selection of two Product element nodes.</w:t></w:r></w:p><w:tbl><w:tblPr><w:tblStyle w:val="TableGrid"/><w:tblW w:w="0" w:type="auto"/><w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/></w:tblPr><w:tblGrid><w:gridCol w:w="2383"/><w:gridCol w:w="2420"/><w:gridCol w:w="2194"/><w:gridCol w:w="2353"/></w:tblGrid><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="2383" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t xml:space="preserve">Table1 </w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2420" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>heading</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2194" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Col1</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2353" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Text</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="2383" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>body</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2420" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Blah</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2194" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Col1 Blah</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2353" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>blah</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="2383" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Body2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2420" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Blah2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2194" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Col1 Blah2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2353" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Blah2</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="2383" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>B</w:t></w:r><w:r><w:t>ody3</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2420" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Blah3</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2194" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Col1 Blah3</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2353" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Blah3</w:t></w:r><w:bookmarkStart w:id="0" w:name="_GoBack"/><w:bookmarkEnd w:id="0"/></w:p></w:tc></w:tr></w:tbl><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"/><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"/><w:sectPr w:rsidR="003404B0"><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/><w:cols w:space="720"/><w:docGrid w:linePitch="360"/></w:sectPr></w:body></w:document>

1 个答案:

答案 0 :(得分:1)

您不能不加区别地使用<xsl:apply-templates/>,因为它也会应用默认模板 - 其中一个复制文本节点。另请注意,您在任何表之外都有与模板匹配的节点。试试这种方式:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> 
<xsl:output method="text"/>

<xsl:template match="/">
    <xsl:apply-templates select="//w:tbl"/>
</xsl:template> 

<xsl:template match="w:tbl">
    <xsl:apply-templates select="w:tr"/>
    <xsl:text>&#10;</xsl:text>
</xsl:template>

<xsl:template match="w:tr">
    <xsl:apply-templates select="w:tc"/>
    <xsl:text>&#10;</xsl:text>
</xsl:template>

<xsl:template match="w:tc">
    <xsl:apply-templates select=".//w:t"/>
    <xsl:if test="position()!=last()">
        <xsl:text>&#9;</xsl:text>
    </xsl:if>
</xsl:template>

</xsl:stylesheet>

或:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> 
<xsl:output method="text"/>

<xsl:template match="/">
    <xsl:for-each select="//w:tbl">
        <xsl:for-each select="w:tr">
            <xsl:for-each select="w:tc">
                <xsl:value-of select=".//w:t"/>
                    <xsl:if test="position()!=last()">
                        <xsl:text>&#9;</xsl:text>
                    </xsl:if>
            </xsl:for-each>
            <xsl:text>&#10;</xsl:text>
        </xsl:for-each>
        <xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

-
两者都将生成制表符分隔的输出。请注意,假设表格单元格本身不包含制表符。