使用Javascript从RSS XML中提取CDATA

时间:2011-03-08 02:18:20

标签: javascript xml rss cdata

我使用JS提取了RSS提要内容,但是“描述”节点包含CDATA,我想将其拆分。

例如,对于项目I下的每个描述节点,我们只想提取从<b>Brief Description:</b>到第一个</div>的内容。

这可能吗?下面是我到目前为止的例子以及下面RSS提要中的xml。

希望有人可以提供帮助:)

脚本示例

<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
  {// code for IE7+, Firefox, Chrome, Opera, Safari
  xmlhttp=new XMLHttpRequest();
  }
else
  {// code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }

xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;



function media(){

description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;

for (i=0;i<18;i++)
{



document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');

b++;
a++;

};

};



</SCRIPT>

RSS XML FEED

<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
  <channel>
    <title>XML Playground: Media News</title>
    <link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    <description>RSS feed for the Media News list.</description>
    <lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
    <generator>Windows SharePoint Services V3 RSS Generator</generator>
    <ttl>60</ttl>
    <image>
      <title>XML Playground: Media News</title>
      <url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
      <link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    </image>
    <item>
      <title>new Item</title>
      <link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
      <author>WALKER,Andrew</author>
      <pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
      <guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
    </item>
    <item>
      <title>My School 2.0 launched</title>
      <link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
                <pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
      <guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
    </item>
  </channel>
</rss>

1 个答案:

答案 0 :(得分:0)

CDATA部分内容只是文本,因此您无法使用DOM进一步解析其内容。您可以使用DOMParser()将CDATA部分的字符串内容重新构建回XML并使用DOM方法,或者使用正则表达式。

要使用后一种方法,请将document.write()行更改为:

// Slice off 5 characters to get rid of the parent <div> and use [\s\S] to mean
//   any character including newlines up until the first closing div tag
document.write('<p>' + description[b].childNodes[0].nodeValue.slice(5).match(/[\s\S]*?<\/div>/) + '</p>');

要使用前一种方法,在这种情况下不太理想,但在其他情况下可能会有所帮助,您可以在for循环中执行此操作:

var cdataContent = new DOMParser().parseFromString('<div xmlns="http://www.w3.org/1999/xhtml">'+description[b].childNodes[0].nodeValue+'</div>', 'text/xml').documentElement;
document.body.appendChild(cdataContent.firstChild);

...但确保仅在加载DOM内容后调用media()

也许你有一些很好的理由,但根据你提供的代码,这样做会简单得多:

for (i=1; i<description.length; i++) {

...忘记a和b(即将b改为i)

还有一个提示:如果您自己构建RSS,请注意您将无法使用嵌套在CDATA部分中的CDATA部分。