我想使用这个xml来准备一个xsd并进一步处理这些行以将数据插入到数据库中。为了准备xsd,使用xslt将结构转换为所需的格式。
<linked-hash-map>
<entry>
<string>_type</string>
<string>News</string>
</entry>
<entry>
<string>value</string>
<list>
<linked-hash-map>
<entry>
<string>name</string>
<string>
Virat Kohli
</string>
</entry>
<entry>
<string>url</string>
<string>
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1
</string>
</entry>
<entry>
<string>image</string>
<linked-hash-map>
<entry>
<string>thumbnail</string>
<linked-hash-map>
<entry>
<string>contentUrl</string>
<string>
https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News
</string>
</entry>
<entry>
<string>width</string>
<int>640</int>
</entry>
</linked-hash-map>
</entry>
</linked-hash-map>
</entry>
<entry>
<string>description</string>
<string>
On Wednesday, cricketer Virat Kohli
</string>
</entry>
<entry>
<string>datePublished</string>
<string>2017-02-16T05:39:00</string>
</entry>
<entry>
<string>category</string>
<string>Entertainment</string>
</entry>
</linked-hash-map>
<linked-hash-map>
<entry>
<string>name</string>
<string>
Shah Rukh Khan’s TV show
</string>
</entry>
<entry>
<string>url</string>
<string>
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1
</string>
</entry>
<entry>
<string>image</string>
<linked-hash-map>
<entry>
<string>thumbnail</string>
<linked-hash-map>
<entry>
<string>contentUrl</string>
<string>
https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&pid=News
</string>
</entry>
<entry>
<string>width</string>
<int>700</int>
</entry>
</linked-hash-map>
</entry>
</linked-hash-map>
</entry>
<entry>
<string>description</string>
<string>
Here’s some wonderful news
</string>
</entry>
<entry>
<string>datePublished</string>
<string>2017-02-16T05:36:00</string>
</entry>
<entry>
<string>category</string>
<string>Entertainment</string>
</entry>
</linked-hash-map>
</list>
</entry>
</linked-hash-map>
这里的网址有查询字符串。如何删除网址或如何使用查询字符串对网址进行编码?
期望的输出:
<?xml version="1.0" encoding="utf-8"?>
<linked-hash-map>
<entry>
<linked-hash-map>
<_type>News</_type>
<datarow>
<name> Virat Kohli</name>
<url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1</url>
<contentUrl> https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News </contentUrl>
<width>640</width>
<description> On Wednesday, cricketer Virat Kohli</description>
<readLink> https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb </readLink>
<datePublished>2017-02-16T05:39:00</datePublished>
<category>Entertainment</category>
</datarow>
<datarow>
<name> Shah Rukh Khan’s TV show</name>
<url> http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1 </url>
<contentUrl> https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News </contentUrl>
<width>640</width>
<description> Here’s some wonderful news </description>
<readLink> https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb </readLink>
<datePublished>2017-02-16T05:39:00</datePublished>
<category>Entertainment</category>
</datarow>
</linked-hash-map>
</entry>
</linked-hash-map>
下面的是我用来转换这个结构的脚本。
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/linked-hash-map">
<xsl:element name="{local-name()}">
<xsl:for-each select="entry">
<xsl:choose>
<xsl:when test="list/linked-hash-map">
<xsl:for-each select="list/linked-hash-map">
<datarow>
<xsl:for-each select="entry">
<xsl:if test="not(node()[1]='image' or node()[1]='about' or node()[1]='clusteredArticles' or node()[1]='mentions' or node()[1]='provider' or node()[1]='url' or node()[1]='description' or node()[1]='name')">
<xsl:text disable-output-escaping="yes"><</xsl:text>
<xsl:value-of select="*[1]"/>
<xsl:text disable-output-escaping="yes">></xsl:text>
<xsl:value-of select="*[2]"/>
<xsl:text disable-output-escaping="yes"></</xsl:text>
<xsl:value-of select="*[1]"/>
<xsl:text disable-output-escaping="yes">></xsl:text>
</xsl:if>
</xsl:for-each>
</datarow>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<xsl:text disable-output-escaping="yes"><</xsl:text>
<xsl:value-of select="*[1]"/>
<xsl:text disable-output-escaping="yes">></xsl:text>
<xsl:value-of select="*[2]"/>
<xsl:text disable-output-escaping="yes"></</xsl:text>
<xsl:value-of select="*[1]"/>
<xsl:text disable-output-escaping="yes">></xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:element>
</xsl:template>
<xsl:template match="/">
<xsl:copy>
<linked-hash-map>
<entry>
<xsl:apply-templates/>
</entry>
</linked-hash-map>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
答案 0 :(得分:0)
目前您的原始XML格式不正确,因为网址中使用的&符号必须替换为相应的XML entity references,即&
。
仔细检查原始XML是如何呈现的,因为它不应该被开发为连接字符串的文本文件(这种标记可以构建的一种方式)。不幸的是,这是通用编程中的常见做法。 XML文档应该使用符合W3C的DOM库(即Java的javax.xml
,Python的xml.etree
,PHP的DOMDocument
,.NET的XmlDocument
)和createElement
构建,appendChild
,setAttribute
或相应的方法。
一旦呈现了有效的XML,请考虑下面更通用的XSLT。
输入 (根据字符实体进行调整)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<linked-hash-map>
<entry>
<string>_type</string>
<string>News</string>
</entry>
<entry>
<string>value</string>
<list>
<linked-hash-map>
<entry>
<string>name</string>
<string>
Virat Kohli
</string>
</entry>
<entry>
<string>url</string>
<string>
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1
</string>
</entry>
<entry>
<string>image</string>
<linked-hash-map>
<entry>
<string>thumbnail</string>
<linked-hash-map>
<entry>
<string>contentUrl</string>
<string>
https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News
</string>
</entry>
<entry>
<string>width</string>
<int>640</int>
</entry>
</linked-hash-map>
</entry>
</linked-hash-map>
</entry>
<entry>
<string>description</string>
<string>
On Wednesday, cricketer Virat Kohli
</string>
</entry>
<entry>
<string>datePublished</string>
<string>2017-02-16T05:39:00</string>
</entry>
<entry>
<string>category</string>
<string>Entertainment</string>
</entry>
</linked-hash-map>
<linked-hash-map>
<entry>
<string>name</string>
<string>
Shah Rukh Khan's TV show
</string>
</entry>
<entry>
<string>url</string>
<string>
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1
</string>
</entry>
<entry>
<string>image</string>
<linked-hash-map>
<entry>
<string>thumbnail</string>
<linked-hash-map>
<entry>
<string>contentUrl</string>
<string>
https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&pid=News
</string>
</entry>
<entry>
<string>width</string>
<int>700</int>
</entry>
</linked-hash-map>
</entry>
</linked-hash-map>
</entry>
<entry>
<string>description</string>
<string>
Here's some wonderful news
</string>
</entry>
<entry>
<string>datePublished</string>
<string>2017-02-16T05:36:00</string>
</entry>
<entry>
<string>category</string>
<string>Entertainment</string>
</entry>
</linked-hash-map>
</list>
</entry>
</linked-hash-map>
XSLT (请参阅内联评论)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- APPLY ONLY SECOND ENTRY OFF ROOT -->
<xsl:template match="/linked-hash-map">
<xsl:copy>
<xsl:apply-templates select="entry[2]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="entry[2]">
<xsl:copy>
<!-- RETRIEVE FIRST ENTRY CONTENT -->
<xsl:element name="{preceding-sibling::entry/string[1]}">
<xsl:value-of select="preceding-sibling::entry/string[2]"/>
</xsl:element>
<!-- APPLY GRANDCHILD LINKED HASH MAP -->
<linked-hash-map><xsl:apply-templates select="list/linked-hash-map"/></linked-hash-map>
</xsl:copy>
</xsl:template>
<!-- GENERALIZE FOR ALL DESCENDANT ENTRY NODES (W/O LINKED HASH MAP CHILD) -->
<xsl:template match="linked-hash-map">
<datarow>
<xsl:for-each select="descendant::entry[local-name(*[2])!='linked-hash-map']">
<xsl:element name="{string[1]}">
<xsl:value-of select="normalize-space(string[2]|int)"/>
</xsl:element>
</xsl:for-each>
<!-- ADDED NODE (NOT PART OF ORIGINAL) -->
<readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
</datarow>
</xsl:template>
</xsl:stylesheet>
<强>输出强>
<?xml version="1.0" encoding="UTF-8"?>
<linked-hash-map>
<entry>
<_type>News</_type>
<linked-hash-map>
<datarow>
<name>Virat Kohli</name>
<url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1</url>
<contentUrl>https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News</contentUrl>
<width>640</width>
<description>On Wednesday, cricketer Virat Kohli</description>
<datePublished>2017-02-16T05:39:00</datePublished>
<category>Entertainment</category>
<readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
</datarow>
<datarow>
<name>Shah Rukh Khan's TV show</name>
<url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1</url>
<contentUrl>https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&pid=News</contentUrl>
<width>700</width>
<description>Here's some wonderful news</description>
<datePublished>2017-02-16T05:36:00</datePublished>
<category>Entertainment</category>
<readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
</datarow>
</linked-hash-map>
</entry>
</linked-hash-map>