Question

我正在使用Google Feed API从tumblr feed中提取博客条目。

我已经能够提取内容，但输出结果是html标签：

<p>I remember one day asking one of my mentors James if he ever got nervous around people. James replied, “Only when I need something from them.”</p>

代码很简单，如下所示：

<script type="text/javascript" src="https://www.google.com/jsapi"></script>
 <script type="text/javascript">
google.load("feeds", "1");

 function initialize() {
   var feed = new google.feeds.Feed("http://adriennetran.tumblr.com/rss");
   feed.load(function(result) {

     if (!result.error) {
       var container = document.getElementById("feed");

       for (var i = 0; i < result.feed.entries.length; i++) {
         var entry = result.feed.entries[i];
         window.content = document.createTextNode(entry.content);
         container.appendChild(content);
       }
     }
   });
 }


 google.setOnLoadCallback(initialize);

 </script>

我尝试编写一个函数来删除以<开头的所有内容：

content_array = content.split(" ");

for (i=0; i < content_array.length; i++){
    if ((content_array[i].split(""))[0] == "<"){
      content_array.splice(i, 1);
    }
}

content2 = content_array.toString();

但我收到Uncaught TypeError: undefined is not a function错误，因为content是object而不是string因此我无法致电content.split(" ")。

我尝试过转换为字符串，但这是控制台的输出

typeof(content)
> "object"

c2 = content.toString()
> "[object Text]"

有没有人对如何操纵从RSS检索到的元素有任何想法？

Answer 1

让我们看看

var regExString = /(<([^>]+)>)/ig; //create reg ex and let it loop (g)
contentString = content.textContent // get text from node (no longer an object but string.

contentString = contentString.replace(regExString, "") //find all tags and delete them.

Answer 2

如果您的网页上包含jQuery，则可以使用从Feed中收到的HTML创建节点，并从HTML中获取文字：

var html = '<p>I remember one day asking one of my mentors James if he ever got nervous around people. James replied, “Only when I need something from them.”</p>';
var text = $(html).text(); // This gets the text from any HTML code and leaves out the tags

如何从RSS提要中删除HTML标记？

2 个答案: