我现在有一个问题。我想修改一些XML值。例如,我想从值中删除<![CDATA[" and the "]]>
个单词。
奇怪的是,它适用于title,price和image_link,但不适用于url ...
这是我的代码:
$dom = new DOMDocument('1.0', 'utf-8');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load('data/kinguin.xml');
$past = time();
echo '(Kinguin) - Starting to remove tags' . "\n";
deleteChildren($dom, 'id');
echo '(Kinguin) - id removed' . "\n";
deleteChildren($dom, 'description');
echo '(Kinguin) - description removed' . "\n";
deleteChildren($dom, 'google_product_category');
echo '(Kinguin) - google_product_category removed' . "\n";
deleteChildren($dom, 'brand');
echo '(Kinguin) - brand removed' . "\n";
deleteChildren($dom, 'mpn');
echo '(Kinguin) - mpn removed' . "\n";
deleteChildren($dom, 'condition');
echo '(Kinguin) - condition removed' . "\n";
deleteChildren($dom, 'product_type');
echo '(Kinguin) - product_type removed' . "\n";
deleteChildren($dom, 'availability');
echo '(Kinguin) - availability removed' . "\n";
deleteChildren($dom, 'quantity');
echo '(Kinguin) - quantity removed' . "\n";
deleteChildren($dom, 'identifier_exists');
echo '(Kinguin) - identifier_exists removed' . "\n";
removeCDATA($dom, 'title');
echo '(Kinguin) - title CDATA removed' . "\n";
removeCDATA($dom, 'price');
echo '(Kinguin) - price CDATA removed' . "\n";
removeCDATA($dom, 'image_link');
echo '(Kinguin) - image_link CDATA removed' . "\n";
removeCDATA($dom, 'url');
echo '(Kinguin) - url CDATA removed' . "\n";
$dom->saveXML();
$dom->save('data/kinguin.xml');
$xml = file_get_contents('data/kinguin.xml');
renameTags($xml, 'link', 'url', 'data/kinguin.xml');
echo '(Kinguin) - Renamed link' . "\n";
$now = time();
echo "(Kinguin) - Time needed: " . ($now - $past) . "s" . "\n";
echo "\n";
功能:
function deleteChildren($dom, $children){
$root = $dom->documentElement;
$marker = $root->getElementsByTagName($children);
for($i = $marker->length - 1; $i >= 0 ; $i--){
$child = $marker->item($i);
$marker->item($i)->parentNode->removeChild($child);
}
}
function renameTags($xml, $old, $new, $path){
$dom = new DOMDocument('1.0', 'utf-8');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xml);
$nodes = $dom->getElementsByTagName($old);
$toRemove = array();
foreach ($nodes as $node) {
$newNode = $dom->createElement($new);
foreach ($node->attributes as $attribute) {
$newNode->setAttribute($attribute->name, $attribute->value);
}
foreach ($node->childNodes as $child) {
$newNode->appendChild($node->removeChild($child));
}
$node->parentNode->appendChild($newNode);
$toRemove[] = $node;
}
foreach ($toRemove as $node) {
$node->parentNode->removeChild($node);
}
$dom->saveXML();
$dom->save($path);
}
function removeCDATA($dom, $tagName){
$root = $dom->documentElement;
$marker = $root->getElementsByTagName($tagName);
for($i = $marker->length - 1; $i >= 0 ; $i--){
$rename = $marker->item($i)->textContent;
$newValue = preg_replace('/(<!\[CDATA\[)/', '', $rename);
$newValue = preg_replace('/(]]>)/', '', $newValue);
$newValue = preg_replace('/( EUR)/', '', $newValue);
//ey-Shop\Cronjob.php on line 350 PHP Warning: preg_replace(): Delimiter must not be alphanumeric or backslash in 351
$marker->item($i)->nodeValue = $newValue;
}
}
这是XML输出:
<?xml version="1.0" encoding="UTF-8"?>
<rss>
<channel xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<title>google_EUR_english_1</title>
<item>
<title>Anno 2070 Uplay CD Key</title>
<g:price>3.27</g:price>
<g:image_link>http://cdn.kinguin.net/media/catalog/category/anno_8.jpg</g:image_link>
<url><![CDATA[http://www.kinguin.net/category/4/anno-2070/?nosalesbooster=1&country_store=1¤cy=EUR]]></url>
</item>
<item>
<title>Anno 2070: Deep Ocean DLC Uplay CD Key</title>
<g:price>4.75</g:price>
<g:image_link>http://cdn.kinguin.net/media/catalog/category/anno-2070-deep-ocean-releasing-this-spring-1089268_1.jpg</g:image_link>
<url><![CDATA[http://www.kinguin.net/category/5/anno-2070-deep-ocean-expansion-pack-dlc/?nosalesbooster=1&country_store=1¤cy=EUR]]></url>
</item>
<item>
这是错误消息:
Warning: removeCDATA(): unterminated entity reference All Stars-Racing Transformed RU VPN in C:\Users\Jan\PhpstormProjects\censored\Cronjob.php on line 353
PHP Warning: removeCDATA(): unterminated entity reference SUV DLC Steam Gift in C:\Users\Jan\PhpstormProjects\censored\Cronjob.php on line 353
第353行:
$marker->item($i)->nodeValue = $newValue;
问候和谢谢!
答案 0 :(得分:0)
如果删除CDATA部分,最终会得到一个包含裸&
个字符的元素,这是不合法的,因为&
只能作为其命名实体转义存在({{1} }}或在CDATA部分内。
这就是为什么CDATA首先出现在那里&amp;应该留给消费解析器来处理。
答案 1 :(得分:0)
如果您确实认为需要从元素节点中删除任何CDATA部分,那么只需执行$foo->textContent = $foo->textContent
,请参阅http://sandbox.onlinephpfunctions.com/code/cca5093433218c7c134f120725988fe6808f906c
function removeCDATA($dom, $tagName){
$marker = $dom->getElementsByTagName($tagName);
for($i = $marker->length - 1; $i >= 0 ; $i--){
$marker->item($i)->textContent = $marker->item($i)->textContent;
}
}
$xml = '<root><items><item><url><![CDATA[http://example.com/search?a=1&b=2&c=3]]></url></item><item><url><![CDATA[http://example.com/search?a=4&b=5&c=6]]></url></item></items></root>';
$doc = new DOMDocument();
$doc->loadXML($xml);
removeCDATA($doc, 'url');
echo $doc->saveXML();
和输出
<root><items><item><url>http://example.com/search?a=1&b=2&c=3</url></item><item><url>http://example.com/search?a=4&b=5&c=6</url></item></items></root>