我使用JS提取了RSS提要内容,但是“描述”节点包含CDATA,我想将其拆分。
例如,对于项目I下的每个描述节点,我们只想提取从<b>Brief Description:</b>
到第一个</div>
的内容。
这可能吗?下面是我到目前为止的例子以及下面RSS提要中的xml。
希望有人可以提供帮助:)
脚本示例
<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;
function media(){
description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;
for (i=0;i<18;i++)
{
document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');
b++;
a++;
};
};
</SCRIPT>
RSS XML FEED
<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
<channel>
<title>XML Playground: Media News</title>
<link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
<description>RSS feed for the Media News list.</description>
<lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
<generator>Windows SharePoint Services V3 RSS Generator</generator>
<ttl>60</ttl>
<image>
<title>XML Playground: Media News</title>
<url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
<link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
</image>
<item>
<title>new Item</title>
<link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
<description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
<author>WALKER,Andrew</author>
<pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
<guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
</item>
<item>
<title>My School 2.0 launched</title>
<link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
<description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
<pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
<guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
</item>
</channel>
</rss>
答案 0 :(得分:0)
CDATA部分内容只是文本,因此您无法使用DOM进一步解析其内容。您可以使用DOMParser()
将CDATA部分的字符串内容重新构建回XML并使用DOM方法,或者使用正则表达式。
要使用后一种方法,请将document.write()
行更改为:
// Slice off 5 characters to get rid of the parent <div> and use [\s\S] to mean
// any character including newlines up until the first closing div tag
document.write('<p>' + description[b].childNodes[0].nodeValue.slice(5).match(/[\s\S]*?<\/div>/) + '</p>');
要使用前一种方法,在这种情况下不太理想,但在其他情况下可能会有所帮助,您可以在for循环中执行此操作:
var cdataContent = new DOMParser().parseFromString('<div xmlns="http://www.w3.org/1999/xhtml">'+description[b].childNodes[0].nodeValue+'</div>', 'text/xml').documentElement;
document.body.appendChild(cdataContent.firstChild);
...但确保仅在加载DOM内容后调用media()
。
也许你有一些很好的理由,但根据你提供的代码,这样做会简单得多:
for (i=1; i<description.length; i++) {
...忘记a和b(即将b改为i)
还有一个提示:如果您自己构建RSS,请注意您将无法使用嵌套在CDATA部分中的CDATA部分。