使用PHP解码多个xml标签

时间:2013-12-12 12:01:23

标签: php xml-parsing

我正在寻找一种解码字符串中多个XML标签的“智能方法”,我有以下功能:

function b($params) {
    $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>';
    $lang = ucfirst(strtolower($params['lang']));
    if (simplexml_load_string($xmldata) === FALSE) {
        return $params['data'];
    } else {
        $langxmlobj = new SimpleXMLElement($xmldata);

        if ($langxmlobj -> $lang) {
            return $langxmlobj -> $lang;
        } else {
            return $params['data'];
        }
    }
}

尝试

$params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>';
$params['lang'] = 'French';
$a = b($params);
print_r($a);

但是输出:

Service DNS

我希望它基本上输出每个标签,所以结果应该是:

Service DNS - DNS Gratuit

拉出我的头发。任何快速的帮助或指示将不胜感激。


编辑:优化需求。

似乎我不够清楚;所以让我展示另一个例子

如果我输入以下字符串:

The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow 
because it makes him <French>Heureux</French><English>Happy</English> to know that it 
is the best <French>Endroit</French><English>Place</English> to find good people with
good <French>Réponses</French><English>Answers</English>.

因此,如果我使用'French'运行函数,它将返回:

The Chat is very happy to stay on stackoverflow 
because it makes him Heureux to know that it 
is the best Endroit to find good people with
good Réponses.

用'英语':

The Cat is very happy to stay on stackoverflow 
because it makes him Happy to know that it 
is the best Place to find good people with
good Answers.

希望现在更清楚了。

5 个答案:

答案 0 :(得分:6)

基本上,我将首先解析lang部分,如:

<French>Chat</French><English>Cat</English>

用这个:

"@(<($defLangs)>.*?</\\2>)+@i"

然后使用回调解析右边的lang。

如果你有php 5.3+,那么:

function transLang($str, $lang, $defLangs = 'French|English')
{
    return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>)+@i", 

            function ($matches) use($lang)
            {
                preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $longSec );

                return $longSec [1];
            }, $str );
}

echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' );

如果没有,有点复杂:

class LangHelper
{

    private $lang;

    function __construct($lang)
    {
        $this->lang = $lang;
    }

    public function callback($matches)
    {
        $lang = $this->lang;

        preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $subMatches );

        return $subMatches [1];
    }

}

function transLang($str, $lang, $defLangs = 'French|English')
{
    $langHelper = new LangHelper ( $lang );

    return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>)+@i", 
            array (
                    $langHelper,
                    'callback' 
            ), $str );
}

echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' );

答案 1 :(得分:3)

如果我理解正确,您希望删除所有“语言”标签,但保留所提供语言的内容。

DOM是一个节点树。标签是元素节点,文本存储在文本节点中。 Xpath允许使用表达式选择节点。因此,请保留要保留的语言元素的所有子节点,并在语言节点之前复制它们。然后删除所有语言节点。即使语言元素包含其他元素节点,例如<em>

,这也会起作用
function replaceLanguageTags($fragment, $language) {
  $dom = new DOMDocument();
  $dom->loadXml(
    '<?xml version="1.0" encoding="UTF-8" ?><content>'.$fragment.'</content>'
  );
  // get an xpath object
  $xpath = new DOMXpath($dom);

  // fetch all nodes with the language you like to keep
  $nodes = $xpath->evaluate('//'.$language);
  foreach ($nodes as $node) {
    // copy all the child nodes of just before the found node
    foreach ($node->childNodes as $childNode) {
      $node->parentNode->insertBefore($childNode->cloneNode(TRUE), $node);
    }
    // remove the found node
    $node->parentNode->removeChild($node);
  }

  // select all language nodes
  $tags = array('English', 'French');
  $nodes = $xpath->evaluate('//'.implode('|//', $tags));
  foreach ($nodes as $node) {
    // remove them
    $node->parentNode->removeChild($node);
  }

  $result = '';
  // we do not need the root node, so save all its children
  foreach ($dom->documentElement->childNodes as $node) {
    $result .= $dom->saveXml($node);
  }
  return $result;
}

$xml = <<<'XML'
The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow
because it makes him <French>Heureux</French><English>Happy</English> to know that it
is the best <French>Endroit</French><English>Place</English> to find good people with
good <French>Réponses</French><English>Answers</English>.
XML;

var_dump(replaceLanguageTags($xml, 'English'));
var_dump(replaceLanguageTags($xml, 'French'));

输出:

string(146) "The Cat is very happy to stay on stackoverflow
because it makes him Happy to know that it
is the best Place to find good people with
good Answers."
string(153) "The Chat is very happy to stay on stackoverflow
because it makes him Heureux to know that it
is the best Endroit to find good people with
good Réponses."

答案 2 :(得分:2)

您使用的是哪个版本的PHP?我不知道还有什么不同,但我复制了&amp;粘贴您的代码并获得以下输出:

SimpleXMLElement Object
(
    [0] => Service DNS
    [1] => DNS Gratuit
)

可以肯定的是,这是我从上面复制的代码:

<?php

function b($params) {
    $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>';
    $lang = ucfirst(strtolower($params['lang']));
    if (simplexml_load_string($xmldata) === FALSE) {
        return $params['data'];
    } else {
        $langxmlobj = new SimpleXMLElement($xmldata);

        if ($langxmlobj -> $lang) {
            return $langxmlobj -> $lang;
        } else {
            return $params['data'];
        }
    }
}

$params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>';
$params['lang'] = 'French';
$a = b($params);
print_r($a);

答案 3 :(得分:2)

这是我的建议。它应该很快而且很简单。您只需要删除所需语言的标签,然后删除任何其他标签及其内容。

缺点是,如果您希望使用除语言之外的任何其他标签,则必须确保开头的标签与结束标签不同(例如<p >Lorem</p>而不是<p>Lorem</p>)。另一方面,这允许您根据需要添加任意数量的语言,而无需保留它们的列表。当缺少提问语言时,您只需要知道默认值(或者只是抛出并捕获异常)。

function only_lang($lang, $text) {
    static $infinite_loop;

    $result = str_replace("<$lang>", '', $text, $num_matches_open);
    $result = str_replace("</$lang>", '', $result, $num_matches_close);

    // Check if the text is malformed. Good place to throw an error
    if($num_matches_open != $num_matches_close) {
        //throw new Exception('Opening and closing tags does not match', 1);

        return $text;
    }

    // Check if this language is present at all.
    // Otherwise fallback to default language or throw an error
    if( ! $num_matches_open) {
        //throw new Exception('No such language', 2);

        // Prevent infinite loop if even the default language is missing
        if($infinite_loop) return $text;
        $infinite_loop = __FUNCTION__;
        return $infinite_loop('English', $text);
    }

    // Strip any other language and return the result
    return preg_replace('!<([^>]+)>.*</\\1>!', '', $result);
}

答案 4 :(得分:1)

我使用正则表达式进行了简单的操作。如果输入仅包含<lang>...</lang>标记,则很有用。

function to_lang($lang="", $str="") {
  return strip_tags(preg_replace('~<(\w+(?<!'.$lang.'))>.*</\1>~Us',"",$str));
}

echo to_lang("English","The happy <French>Chat</French><English>Cat</English>");

删除<tag>...</tag>中未指定的每个$lang。如果<tag-name>内可能有空格/特殊内容,例如<French-1>\w替换为[^/>]


搜索模式解释了一下

1。)<(\w+(?<!'.$lang.'))

<后跟一个或多个Word characters, 不匹配$lang(使用negative lookbehind) 并捕获<tag_name>

2。).*后跟任何内容( ungreedy modifier U 点匹配换行符:修饰符<强>取值

3。)</\1>直到捕获的标签关闭