保存使用DOMDocument创建的XML会出现错误“DOMDocument :: save()字符串不是UTF-8”

时间:2011-05-16 06:27:40

标签: php xml utf-8 domdocument

我正在尝试从数据库内容生成RSS。以下是代码的相关片段:

$doc = new DOMDocument();
$doc->formatOutput = true;
$doc->preserveWhiteSpace = false; 
if(is_file($filePath)) {
    $doc->load($filePath);
}
else {
    $doc->loadXML('
        <rss version="2.0">
        <channel>
        <title></title>
        <description></description>
        <link></link>
        </channel></rss>
    ');
}

.
.
.

$titleText = $row['Subject'];
$descriptionText = $row['Detail']; // this row has the problem
$linkText = sprintf('http://www.domain.com/%s', $row['URL']);
$pubDateText = date(DATE_RSS, strtotime($row['Created']));

$titleNode = $doc->createElement('title');
$descriptionNode = $doc->createElement('description');
$linkNode = $doc->createElement('link');
$pubDateNode = $doc->createElement('pubDate');

$titleNode->appendChild($doc->createTextNode($titleText));
$descriptionNode->appendChild($doc->createTextNode($descriptionText));
$linkNode->appendChild($doc->createTextNode($linkText));
$pubDateNode->appendChild($doc->createTextNode($pubDateText));

$itemNode = $doc->createElement('item');
$itemNode->appendChild($titleNode);
$itemNode->appendChild($descriptionNode);
$itemNode->appendChild($linkNode);
$itemNode->appendChild($pubDateNode);

$channelNode = $doc->getElementsByTagName('channel')->item(0);
$channelNode->appendChild($itemNode);

$doc->save($filePath); // this is where warning is raised

这是输出:

<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>ALPHA BRAVO CHARLIE</title>
    <description>DELTA ECHO FOXTROT</description>
    <link>http://www.xxxxxxx.yyy/</link>
    <item>
      <title>Title Here</title>
      <description/><!-- this node has the problem -->
      <link>http://www.xxxxxxx.yyy/article/12345678/</link>
      <pubDate>Sun, 01 May 2011 23:18:28 +0500</pubDate>
    </item>
  </channel>
</rss>

正如您所看到的,问题是DOMDocument无法将详细信息插入RSS并引发错误:

Warning: DOMDocument::save() [domdocument.save]: string is not in UTF-8 in C:\Inetpub\wwwroot\cron-rss.php on line 66

当我注释掉该行时,代码工作正常但详细信息节点为空。取消注释该行时,将引发警告并且详细信息节点仍为空。请指教。如有必要,我可以提供其他详细信息。

2 个答案:

答案 0 :(得分:1)

如果文本来自数据库,则列可能不是UTF-8,请尝试iconv

答案 1 :(得分:0)

在我的脑海中,我希望看到description字段的内容包含在<!CDATA[]]>中,以防万一。您可以尝试以下方式而不是createTextNode

$descriptionNode->appendChild($doc->createCDATASection($descriptionText));