如果节点值包含url,如何删除xml节点?

时间:2017-02-18 12:24:22

标签: xml xslt

我想使用这个xml来准备一个xsd并进一步处理这些行以将数据插入到数据库中。为了准备xsd,使用xslt将结构转换为所需的格式。

<linked-hash-map>
  <entry>
    <string>_type</string>
    <string>News</string>
  </entry>
  <entry>
    <string>value</string>
    <list>
      <linked-hash-map>
        <entry>
          <string>name</string>
          <string>
            Virat Kohli 
          </string>
        </entry>
        <entry>
          <string>url</string>
          <string>
            http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1
          </string>
        </entry>
        <entry>
          <string>image</string>
          <linked-hash-map>
            <entry>
              <string>thumbnail</string>
              <linked-hash-map>
                <entry>
                  <string>contentUrl</string>
                  <string>
                    https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News
                  </string>
                </entry>
                <entry>
                  <string>width</string>
                  <int>640</int>
                </entry>
              </linked-hash-map>
            </entry>
          </linked-hash-map>
        </entry>
        <entry>
          <string>description</string>
          <string>
            On Wednesday, cricketer Virat Kohli
          </string>
        </entry>
        <entry>
          <string>datePublished</string>
          <string>2017-02-16T05:39:00</string>
        </entry>
        <entry>
          <string>category</string>
          <string>Entertainment</string>
        </entry>
      </linked-hash-map>
      <linked-hash-map>
        <entry>
          <string>name</string>
          <string>
            Shah Rukh Khan’s TV show
          </string>
        </entry>
        <entry>
          <string>url</string>
          <string>
            http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1
          </string>
        </entry>
        <entry>
          <string>image</string>
          <linked-hash-map>
            <entry>
              <string>thumbnail</string>
              <linked-hash-map>
                <entry>
                  <string>contentUrl</string>
                  <string>
                    https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&pid=News
                  </string>
                </entry>
                <entry>
                  <string>width</string>
                  <int>700</int>
                </entry>
              </linked-hash-map>
            </entry>
          </linked-hash-map>
        </entry>
        <entry>
          <string>description</string>
          <string>
            Here’s some wonderful news 
          </string>
        </entry>
        <entry>
          <string>datePublished</string>
          <string>2017-02-16T05:36:00</string>
        </entry>
        <entry>
          <string>category</string>
          <string>Entertainment</string>
        </entry>
      </linked-hash-map>
    </list>
  </entry>
</linked-hash-map>

这里的网址有查询字符串。如何删除网址或如何使用查询字符串对网址进行编码?

期望的输出:

<?xml version="1.0" encoding="utf-8"?>
<linked-hash-map>
  <entry>
    <linked-hash-map>
      <_type>News</_type>
      <datarow>
        <name> Virat Kohli</name>
        <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1</url>
        <contentUrl>  https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News </contentUrl>
        <width>640</width>
        <description> On Wednesday, cricketer Virat Kohli</description>
        <readLink> https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb </readLink>
        <datePublished>2017-02-16T05:39:00</datePublished>
        <category>Entertainment</category>     
      </datarow>
      <datarow>
        <name> Shah Rukh Khan’s TV show</name>
        <url> http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1 </url>
        <contentUrl>  https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News </contentUrl>
        <width>640</width>
        <description> Here’s some wonderful news </description>
        <readLink> https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb </readLink>
        <datePublished>2017-02-16T05:39:00</datePublished>
        <category>Entertainment</category>
      </datarow>
    </linked-hash-map>
  </entry>
</linked-hash-map>
下面的

是我用来转换这个结构的脚本。

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/linked-hash-map">
    <xsl:element name="{local-name()}">
      <xsl:for-each select="entry">
        <xsl:choose>
          <xsl:when test="list/linked-hash-map">
            <xsl:for-each select="list/linked-hash-map">
              <datarow>
                <xsl:for-each select="entry">
                  <xsl:if test="not(node()[1]='image' or node()[1]='about' or node()[1]='clusteredArticles'  or node()[1]='mentions' or node()[1]='provider' or node()[1]='url' or node()[1]='description' or node()[1]='name')">
                    <xsl:text disable-output-escaping="yes">&lt;</xsl:text>
                    <xsl:value-of select="*[1]"/>
                    <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
                    <xsl:value-of select="*[2]"/>
                    <xsl:text disable-output-escaping="yes">&lt;/</xsl:text>
                    <xsl:value-of select="*[1]"/>
                    <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
                  </xsl:if>
                </xsl:for-each>
              </datarow>
            </xsl:for-each>
          </xsl:when>
          <xsl:otherwise>
            <xsl:text disable-output-escaping="yes">&lt;</xsl:text>
            <xsl:value-of select="*[1]"/>
            <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
            <xsl:value-of select="*[2]"/>
            <xsl:text disable-output-escaping="yes">&lt;/</xsl:text>
            <xsl:value-of select="*[1]"/>
            <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </xsl:element>
  </xsl:template>
  <xsl:template match="/">
    <xsl:copy>
      <linked-hash-map>
        <entry>
          <xsl:apply-templates/>
        </entry>
      </linked-hash-map>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

1 个答案:

答案 0 :(得分:0)

目前您的原始XML格式不正确,因为网址中使用的&符号必须替换为相应的XML entity references,即&amp;

仔细检查原始XML是如何呈现的,因为它不应该被开发为连接字符串的文本文件(这种标记可以构建的一种方式)。不幸的是,这是通用编程中的常见做法。 XML文档应该使用符合W3C的DOM库(即Java的javax.xml,Python的xml.etree,PHP的DOMDocument,.NET的XmlDocument)和createElement构建,appendChildsetAttribute或相应的方法。

一旦呈现了有效的XML,请考虑下面更通用的XSLT。

输入 (根据字符实体进行调整)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<linked-hash-map>
  <entry>
    <string>_type</string>
    <string>News</string>
  </entry>
  <entry>
    <string>value</string>
    <list>
      <linked-hash-map>
        <entry>
          <string>name</string>
          <string>
            Virat Kohli 
          </string>
        </entry>
        <entry>
          <string>url</string>
          <string>
            http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&amp;v=1&amp;r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&amp;p=DevEx,5026.1
          </string>
        </entry>
        <entry>
          <string>image</string>
          <linked-hash-map>
            <entry>
              <string>thumbnail</string>
              <linked-hash-map>
                <entry>
                  <string>contentUrl</string>
                  <string>
                    https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&amp;pid=News
                  </string>
                </entry>
                <entry>
                  <string>width</string>
                  <int>640</int>
                </entry>
              </linked-hash-map>
            </entry>
          </linked-hash-map>
        </entry>
        <entry>
          <string>description</string>
          <string>
            On Wednesday, cricketer Virat Kohli
          </string>
        </entry>
        <entry>
          <string>datePublished</string>
          <string>2017-02-16T05:39:00</string>
        </entry>
        <entry>
          <string>category</string>
          <string>Entertainment</string>
        </entry>
      </linked-hash-map>
      <linked-hash-map>
        <entry>
          <string>name</string>
          <string>
            Shah Rukh Khan's TV show
          </string>
        </entry>
        <entry>
          <string>url</string>
          <string>
            http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&amp;v=1&amp;r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&amp;p=DevEx,5040.1
          </string>
        </entry>
        <entry>
          <string>image</string>
          <linked-hash-map>
            <entry>
              <string>thumbnail</string>
              <linked-hash-map>
                <entry>
                  <string>contentUrl</string>
                  <string>
                    https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&amp;pid=News
                  </string>
                </entry>
                <entry>
                  <string>width</string>
                  <int>700</int>
                </entry>
              </linked-hash-map>
            </entry>
          </linked-hash-map>
        </entry>
        <entry>
          <string>description</string>
          <string>
            Here's some wonderful news 
          </string>
        </entry>
        <entry>
          <string>datePublished</string>
          <string>2017-02-16T05:36:00</string>
        </entry>
        <entry>
          <string>category</string>
          <string>Entertainment</string>
        </entry>
      </linked-hash-map>
    </list>
  </entry>
</linked-hash-map>

XSLT (请参阅内联评论)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!-- APPLY ONLY SECOND ENTRY OFF ROOT -->  
  <xsl:template match="/linked-hash-map">
    <xsl:copy>      
      <xsl:apply-templates select="entry[2]"/>      
    </xsl:copy>
  </xsl:template>

  <xsl:template match="entry[2]">
    <xsl:copy>
      <!-- RETRIEVE FIRST ENTRY CONTENT -->  
      <xsl:element name="{preceding-sibling::entry/string[1]}">
        <xsl:value-of select="preceding-sibling::entry/string[2]"/>
      </xsl:element>
      <!-- APPLY GRANDCHILD LINKED HASH MAP -->
      <linked-hash-map><xsl:apply-templates select="list/linked-hash-map"/></linked-hash-map>
    </xsl:copy>
  </xsl:template>

  <!-- GENERALIZE FOR ALL DESCENDANT ENTRY NODES (W/O LINKED HASH MAP CHILD) -->  
  <xsl:template match="linked-hash-map">    
    <datarow>
      <xsl:for-each select="descendant::entry[local-name(*[2])!='linked-hash-map']">        
          <xsl:element name="{string[1]}">
            <xsl:value-of select="normalize-space(string[2]|int)"/>
          </xsl:element>
      </xsl:for-each>
      <!-- ADDED NODE (NOT PART OF ORIGINAL) -->
      <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
    </datarow>    
  </xsl:template>

</xsl:stylesheet>

<强>输出

<?xml version="1.0" encoding="UTF-8"?>
<linked-hash-map>
   <entry>
      <_type>News</_type>
      <linked-hash-map>
         <datarow>
            <name>Virat Kohli</name>
            <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&amp;v=1&amp;r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&amp;p=DevEx,5026.1</url>
            <contentUrl>https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&amp;pid=News</contentUrl>
            <width>640</width>
            <description>On Wednesday, cricketer Virat Kohli</description>
            <datePublished>2017-02-16T05:39:00</datePublished>
            <category>Entertainment</category>
            <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
         </datarow>
         <datarow>
            <name>Shah Rukh Khan's TV show</name>
            <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&amp;v=1&amp;r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&amp;p=DevEx,5040.1</url>
            <contentUrl>https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&amp;pid=News</contentUrl>
            <width>700</width>
            <description>Here's some wonderful news</description>
            <datePublished>2017-02-16T05:36:00</datePublished>
            <category>Entertainment</category>
            <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
         </datarow>
      </linked-hash-map>
   </entry>
</linked-hash-map>