Question

我使用JS提取了RSS提要内容，但是“描述”节点包含CDATA，我想将其拆分。

例如，对于项目I下的每个描述节点，我们只想提取从<b>Brief Description:</b>到第一个</div>的内容。

这可能吗？下面是我到目前为止的例子以及下面RSS提要中的xml。

希望有人可以提供帮助：）

脚本示例

<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
  {// code for IE7+, Firefox, Chrome, Opera, Safari
  xmlhttp=new XMLHttpRequest();
  }
else
  {// code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }

xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;



function media(){

description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;

for (i=0;i<18;i++)
{



document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');

b++;
a++;

};

};



</SCRIPT>

RSS XML FEED

<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
  <channel>
    <title>XML Playground: Media News</title>
    <link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    <description>RSS feed for the Media News list.</description>
    <lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
    <generator>Windows SharePoint Services V3 RSS Generator</generator>
    <ttl>60</ttl>
    <image>
      <title>XML Playground: Media News</title>
      <url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
      <link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    </image>
    <item>
      <title>new Item</title>
      <link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
      <author>WALKER,Andrew</author>
      <pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
      <guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
    </item>
    <item>
      <title>My School 2.0 launched</title>
      <link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
                <pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
      <guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
    </item>
  </channel>
</rss>

Answer 1

CDATA部分内容只是文本，因此您无法使用DOM进一步解析其内容。您可以使用DOMParser()将CDATA部分的字符串内容重新构建回XML并使用DOM方法，或者使用正则表达式。

要使用后一种方法，请将document.write()行更改为：

// Slice off 5 characters to get rid of the parent <div> and use [\s\S] to mean
//   any character including newlines up until the first closing div tag
document.write('<p>' + description[b].childNodes[0].nodeValue.slice(5).match(/[\s\S]*?<\/div>/) + '</p>');

要使用前一种方法，在这种情况下不太理想，但在其他情况下可能会有所帮助，您可以在for循环中执行此操作：

var cdataContent = new DOMParser().parseFromString('<div xmlns="http://www.w3.org/1999/xhtml">'+description[b].childNodes[0].nodeValue+'</div>', 'text/xml').documentElement;
document.body.appendChild(cdataContent.firstChild);

...但确保仅在加载DOM内容后调用media()。

也许你有一些很好的理由，但根据你提供的代码，这样做会简单得多：

for (i=1; i<description.length; i++) {

...忘记a和b（即将b改为i）

还有一个提示：如果您自己构建RSS，请注意您将无法使用嵌套在CDATA部分中的CDATA部分。

使用Javascript从RSS XML中提取CDATA

1 个答案: