如何从RSS提要中删除HTML标记?

时间:2014-12-22 03:31:48

标签: javascript rss

我正在使用Google Feed APItumblr feed中提取博客条目。

我已经能够提取内容,但输出结果是html标签:

<p>I remember one day asking one of my mentors James if he ever got nervous around people. James replied, “Only when I need something from them.”</p>

代码很简单,如下所示:

<script type="text/javascript" src="https://www.google.com/jsapi"></script>
 <script type="text/javascript">
google.load("feeds", "1");

 function initialize() {
   var feed = new google.feeds.Feed("http://adriennetran.tumblr.com/rss");
   feed.load(function(result) {

     if (!result.error) {
       var container = document.getElementById("feed");

       for (var i = 0; i < result.feed.entries.length; i++) {
         var entry = result.feed.entries[i];
         window.content = document.createTextNode(entry.content);
         container.appendChild(content);
       }
     }
   });
 }


 google.setOnLoadCallback(initialize);

 </script>

我尝试编写一个函数来删除以<开头的所有内容:

content_array = content.split(" ");

for (i=0; i < content_array.length; i++){
    if ((content_array[i].split(""))[0] == "<"){
      content_array.splice(i, 1);
    }
}

content2 = content_array.toString();

但我收到Uncaught TypeError: undefined is not a function错误,因为contentobject而不是string因此我无法致电content.split(" ")

我尝试过转换为字符串,但这是控制台的输出

typeof(content)
> "object"

c2 = content.toString()
> "[object Text]"

有没有人对如何操纵从RSS检索到的元素有任何想法?

2 个答案:

答案 0 :(得分:4)

让我们看看

var regExString = /(<([^>]+)>)/ig; //create reg ex and let it loop (g)
contentString = content.textContent // get text from node (no longer an object but string.

contentString = contentString.replace(regExString, "") //find all tags and delete them.

答案 1 :(得分:0)

如果您的网页上包含jQuery,则可以使用从Feed中收到的HTML创建节点,并从HTML中获取文字:

var html = '<p>I remember one day asking one of my mentors James if he ever got nervous around people. James replied, “Only when I need something from them.”</p>';
var text = $(html).text(); // This gets the text from any HTML code and leaves out the tags