Question

我有一个非常大的文档遵循这个结构（＆gt; 5,000个这样的实例）：

<Questions>
    <QuestionID>558013</QuestionID>
    <Question>All of the following materials are categorized as &lt;chr8220&gt;fine art&lt;chr8221&gt; EXCEPT</Question>
    <Answer1>textiles</Answer1>
    <Answer2>paintings</Answer2>
    <Answer3>drawings</Answer3>
    <Answer4>sculptures</Answer4>
    <Answer5>architecture</Answer5>
    <AnswerGuide>Textile is not included in the category of fine art. Traditionally, textiles have been categorized as craft art.</AnswerGuide>
    <TypeID>1</TypeID>
    <Source>6,1,3</Source>
    <Footnote />
    <CardTypeID>0</CardTypeID>
    <Year>2016</Year>
    <SubjectID>41</SubjectID>
    <QuesNumber>4</QuesNumber>
    <AuxNum>4</AuxNum>
    <RandList>43512</RandList>
    <ResourceTypeID>382</ResourceTypeID>
    <TreeKey>01/01/01/</TreeKey>
    <TestID>41901</TestID>
    <DiffShort>N</DiffShort>
    <CardType />
</Questions>

我不需要通过CardType输入字段TypeID，这样可以更轻松地删除这些字段。目前，我只是使用Notepad ++编辑此XML，并且无法找到一种简单的方法来删除所有这些字段及其内容。有可能这样做吗？理想情况下，它会将上述内容简化为：

<Questions>
    <QuestionID>558013</QuestionID>
    <Question>All of the following materials are categorized as &lt;chr8220&gt;fine art&lt;chr8221&gt; EXCEPT</Question>
    <Answer1>textiles</Answer1>
    <Answer2>paintings</Answer2>
    <Answer3>drawings</Answer3>
    <Answer4>sculptures</Answer4>
    <Answer5>architecture</Answer5>
    <AnswerGuide>Textile is not included in the category of fine art. Traditionally, textiles have been categorized as craft art.</AnswerGuide>
</Questions>

Answer 1

考虑XSLT，这是专门为将XML文件转换为各种最终用途而设计的声明性专用语言。以下是两种方法。保存为.xsl文件并将其应用于.xml文件。 XSL文件是格式良好的XML文件，可以像任何其他XML一样进行解析。

保留所需节点 （仅保留节点＆＃39;问题＆＃39;或者＆＃39;答案＆＃39;在其名称中）

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <!-- Identity transform -->
   <xsl:template match="@* | node()">
      <xsl:copy>
        <xsl:apply-templates select="@* | node()" />
      </xsl:copy>
   </xsl:template>

   <!-- Questions template -->
   <xsl:template match="Questions">
     <xsl:copy>
      <xsl:copy-of select="*[contains(name(),'Question') or contains(name(),'Answer')]"/>
     </xsl:copy>
   </xsl:template>

</xsl:stylesheet>

删除不需要的节点 （删除所有节点，而不是＆＃39;问题＆＃39;或＆＃39;答案＆＃39;在其名称中）

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <!-- Identity transform -->
   <xsl:template match="@* | node()">
      <xsl:copy>
        <xsl:apply-templates select="@* | node()" />
      </xsl:copy>
   </xsl:template>

   <!-- Empty template -->
   <xsl:template match="Questions/*[not(contains(name(),'Question')) and not(contains(name(),'Answer'))]"/>

</xsl:stylesheet>

如何运行XSL脚本？

Notepad ++本身不是XSLT处理器，而只是一个编辑器。大多数通用语言都有各种扩展或库中的XSLT 1.0处理器，包括Java，C＃，Perl，Python，PHP，VB等等。此外，Xalan和Saxon等专用executables可以运行更高级别的2.0和3.0类型的XSLT脚本。此外，命令行解释器（如Windows PowerShell和Unix Bash）可以运行它们。在大多数Linux / Mac OS上都预先安装了从终端运行的xsltproc。

警告

XSLT往往是内存密集型处理，需要读入整个文档并在内存中进行维护。因此，它们在较小的文件上很好，但不能在大文件上扩展。但是，如果您有足够的RAM容量，大约是XML文档大小的5倍（粗略估计），那么您可以在适当的时间和资源中处理此类XSLT。当然，如果将大文档分成更小的部分，XSLT可以运行得更顺畅。

如何删除XML文档中特定类型的所有字段？

1 个答案: