DOMParser无法解析某些节点?

时间:2011-08-14 13:49:13

标签: javascript google-chrome-extension

我正在为Google Chrome创建一个插件。我尝试解析以下xml:

<?xml version="1.0" encoding="utf-8"?>
<anime>
  <entry>
    <id>9938</id>
    <title>Ikoku Meiro no Crois&Atilde;&copy;e</title>
    <english>Crois&Atilde;&copy;e in a Foreign Labyrinth ~ The Animation</english>
    <synonyms>Ikoku Meiro no Crois&Atilde;&copy;e The Animation; Ikoku Meiro No Croisee The Animation; La crois&Atilde;&copy;e dans un labyrinthe &Atilde;&copy;tranger Special</synonyms>
    <episodes>12</episodes>
    <score>7.72</score>
    <type>TV</type>
    <status>Currently Airing</status>
    <start_date>2011-07-04</start_date>
    <end_date>0000-00-00</end_date>
    <synopsis>The story takes place in the second half of the 19th century, as Japanese culture gains popularity in the West. A young Japanese girl, Yune, accompanies a French traveller, Oscar, on his journey back to France, and offers to help at the family&amp;#039;s ironwork shop in Paris. Oscar&amp;#039;s nephew and shop-owner Claude reluctantly accepts to take care of Yune, and we learn how those two, who have so little in common, get to understand each other and live together in the Paris of the 1800s.</synopsis>
    <image>http://cdn.myanimelist.net/images/anime/8/29031.jpg</image>
  </entry>
</anime>

使用此代码:

var parser = new DOMParser();
var xmlText = response.value;
var doc = parser.parseFromString(xmlText, "text/xml");
var entries = doc.getElementsByTagName("entry");

for (var i = 0; i < entries.length; ++i) {
    var node = entries[i];

    var titles = node.getElementsByTagName("title");
    console.log("titles.length: " + titles.length);
    if (titles.length > 0) {
        console.log("title: " + titles[0].childNodes[0].nodeValue);
    }

    var scores = node.getElementsByTagName("score");
    console.log("scores.length: " + scores.length);
    if (scores.length > 0) {
        console.log("score: " + scores[0].childNodes[0].nodeValue);
    }

    var ids = node.getElementsByTagName("id");
    console.log("ids.length: " + ids.length);
    if (ids.length > 0) {
        console.log("id: " + ids[0].childNodes[0].nodeValue);
    }
}

查看输出似乎找到了title节点,但没有找到它的内部文本。根本找不到score节点:

titles.length: 1
title: 
scores.length: 0
ids.length: 1
id: 9938

有谁知道为什么会发生这种情况和/或如何修复它?

解决方法

我目前正在使用基于此answer的解决方案的解决方法:

function htmlDecode(input){
  var e = document.createElement('div');
  e.innerHTML = input;
  return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

function xmlDecode(input){
  var result = input;
  result = result.replace(/</g,  "&lt;");
  result = result.replace(/>/g,  "&gt;");
  result = result.replace(/\n/g, "&#10;");
  return htmlDecode(result);
}

// Usage:
var parser = new DOMParser();
var doc = parser.parseFromString(xmlDecode(xmlText), "text/xml");

我不确定这是否是最好的方式,但至少它让我更进一步。

1 个答案:

答案 0 :(得分:4)

我不确定这是否是导致问题的原因,但XML文档只定义了五个命名实体:&amp;&lt;&gt;&quot;&apos;。将其他实体替换为他们要表示的字符(您的文档使用UTF-8,使用©或其他此类字符是完全安全的)或使用数字实体(如&#169;)替换。< / p>

或者,您可以定义自己的实体,如果在您的文档中更换它们很困难:

<!DOCTYPE anime [
    <!ENTITY copy "&#169;">
]>