php SimpleXMLelement解析具有多个“潜在”属性的XML标记

时间:2013-02-22 01:28:29

标签: php xpath simplexml

正如标题所示,我有一个关于解析可能具有多个属性(或根本没有)的XML标记的问题,我正在寻找关于如何实现这一点的建议;但首先,我认为有点背景。

我正在开发一个名为AIML的基于PHP的Program O解释器脚本,我正在将代码从字符串替换函数(例如str_replace,preg_replace等)迁移到使用PHP的内置SimpleXML函数。到目前为止,我为各种AIML标签创建的几乎所有解析函数都是完整的,并且工作得很好,但是一个标签特别是踢我的座位温暖,那就是CONDITION标签。

根据AIML tag reference,标签有三个单独的“形式”:一个同时具有NAME和(VALUE | CONTAINS | EXISTS)属性,称为“多条件”,一个只有NAME属性,称为“单一名称列表条件”,最后的“形式”,称为“列表条件”,它只是CONDITION标签,根本没有属性。我之前链接的AIML标签参考有三个表格的例子,但中间有很多单词,所以我将在这里重复它们,与周围的AIML代码相关:

FORM:多条件标签:

<category>
  <pattern>I AM BLOND</pattern>
  <template>You sound very
    <condition name="gender" value="female"> attractive.</condition>
    <condition name="gender" value="male"> handsome.</condition>
  </template>
</category>

FORM:list-condition标签:

<category>
  <pattern>I AM BLOND</pattern>
  <template>You sound very
    <condition>
      <li name="gender" value="female"> attractive.</li>
      <li name="gender" value="male"> handsome.</li>
    </condition>
  </template>
</category>

FORM:单个名称列表条件标记

<category>
  <pattern>I AM BLOND</pattern>
  <template>You sound very
    <condition name="gender">
      <li value="female"> attractive.</li>
      <li value="male"> handsome.</li>
    </condition>
  </template>
</category> 

在我正在处理的以前版本的脚本中,只使用了CONDITION标签的“list-condition”形式,虽然这是最常用的形式,但它并不是专门用的,所以我需要能够适应其他两种形式。所以我的问题是:

如何以有效的方式实现这一目标?

我已经有了工作代码来解析CONDITION标签的list-condition形式,并且prelimary测试看起来很有希望,因为它不会抛出任何错误,并且似乎产生了所需的响应(但仅限于list-condition形式。其他2个表单因错误而失败,原因很明显。该功能列在下面:

function parse_condition_tag($convoArr, $element, $parentName, $level)
{
  runDebug(__FILE__, __FUNCTION__, __LINE__, 'Starting function and setting timestamp.', 2);
  $response = array();
  $attrName = $element['name'];
  if (!empty ($attrName))
  {
    $attrName = ($attrName == '*') ? $convoArr['star'][1] : $attrName;
    $search = $convoArr['client_properties'][$attrName];
    $path = ($search != 'undefined') ? "//li[@value=\"$search\"]" : '//li[not@*]';
    $choice = $element->xpath($path);
    $children = $choice[0]->children();
    if (!empty ($children))
    {
      $response = parseTemplateRecursive($convoArr, $children, $level + 1);
    }
    else
    {
      $response[] = (string) $choice[0];
    }
    $response_string = implode_recursive(' ', $response, __FILE__, __FUNCTION__, __LINE__);
    runDebug(__FILE__, __FUNCTION__, __LINE__, "Returning '$response_string' and exiting function.", 4);
    return $response_string;
  }
  trigger_error('Parsing of the CONDITION tag failed! XML = ' . $element->asXML());
}

我使用SimpleXML函数还是比较新的,所以我可能会遗漏一些明显的东西。事实上,我希望情况确实如此。 :)

编辑:按照我的一条评论中的承诺,添加我最终最终得到的功能:

  /*
   * function parse_condition_tag
   * Acts as a de-facto if/else structure, selecting a specific output, based on certain criteria
   * @param [array] $convoArr    - The conversation array (a container for a number of necessary variables)
   * @param [object] $element    - The current XML element being parsed
   * @param [string] $parentName - The parent tag (if applicable)
   * @param [int] $level         - The current recursion level
   * @return [string] $response_string
   */

 function parse_condition_tag($convoArr, $element, $parentName, $level)
 {
   runDebug(__FILE__, __FUNCTION__, __LINE__, 'Starting function and setting timestamp.', 2);
   global $error_response;
   $response = array();
   $attrName = $element['name'];
   $attributes = (array)$element->attributes();
   $attributesArray = (isset($attributes['@attributes'])) ? $attributes['@attributes'] : array();
   runDebug(__FILE__, __FUNCTION__, __LINE__, 'Element attributes:' . print_r($attributesArray, true), 1);
   $attribute_count = count($attributesArray);
   runDebug(__FILE__, __FUNCTION__, __LINE__, "Element attribute count = $attribute_count", 1);
   if ($attribute_count == 0) // Bare condition tag
   {
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with no attributes. XML = ' . $element->asXML(), 2);
     $liNamePath = 'li[@name]';
     $condition_xPath = '';
     $exclude = array();
     $choices = $element->xpath($liNamePath);
     foreach ($choices as $choice)
     {
       $choice_name = (string)$choice['name'];
       if (in_array($choice_name, $exclude)) continue;
       $exclude[] = $choice_name;
       runDebug(__FILE__, __FUNCTION__, __LINE__, 'Client properties = ' . print_r($convoArr['client_properties'], true), 2);
       $choice_value = get_client_property($convoArr, $choice_name);
       $condition_xPath .= "li[@name=\"$choice_name\"][@value=\"$choice_value\"]|";
     }
     $condition_xPath .= 'li[not(@*)]';
     runDebug(__FILE__, __FUNCTION__, __LINE__, "xpath search = $condition_xPath", 4);
     $pick_search = $element->xpath($condition_xPath);
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Pick array = ' . print_r($pick_search, true), 2);
     $pick_count = count($pick_search);
     runDebug(__FILE__, __FUNCTION__, __LINE__, "Pick count = $pick_count.", 2);
     $pick = $pick_search[0];
   }
   elseif (array_key_exists('value', $attributesArray) or array_key_exists('contains', $attributesArray) or array_key_exists('exists', $attributesArray)) // condition tag with either VALUE, CONTAINS or EXISTS attributes
   {
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with 2 attributes.', 2);
     $condition_name = (string)$element['name'];
     $test_value = get_client_property($convoArr, $condition_name);
     switch (true)
     {
       case (isset($element['value'])):
         $condition_value = (string)$element['value'];
         break;
       case (isset($element['value'])):
         $condition_value = (string)$element['value'];
         break;
       case (isset($element['value'])):
         $condition_value = (string)$element['value'];
         break;
       default:
         runDebug(__FILE__, __FUNCTION__, __LINE__, 'Something went wrong with parsing the CONDITION tag. Returning the error response.', 1);
         return $error_response;
     }
     $pick = ($condition_value == $test_value) ? $element : '';
   }
   elseif (array_key_exists('name', $attributesArray)) // this ~SHOULD~ just trigger if the NAME value is present, and ~NOT~ NAME and (VALUE|CONTAINS|EXISTS)
   {
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with only the NAME attribute.', 2);
     $condition_name = (string)$element['name'];
     $test_value = get_client_property($convoArr, $condition_name);
     $path = "li[@value=\"$test_value\"]|li[not(@*)]";
     runDebug(__FILE__, __FUNCTION__, __LINE__, "search string = $path", 4);
     $choice = $element->xpath($path);
     $pick = $choice[0];
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Found a match. Pick = ' . print_r($choice, true), 4);
   }
   else // nothing matches
   {
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'No matches found. Returning default error response.', 1);
     return $error_response;
   }
   $children = (is_object($pick)) ? $pick->children() : null;
   if (!empty ($children))
   {
     $response = parseTemplateRecursive($convoArr, $children, $level + 1);
   }
   else
   {
     $response[] = (string) $pick;
   }
   $response_string = implode_recursive(' ', $response);
   return $response_string;
 }

我怀疑可能有一种更好,更优雅的方式来做这件事(我的生活故事,真的),但上面的工作是按照预期的。任何改进建议都将被感激,并经过仔细考虑。

1 个答案:

答案 0 :(得分:0)

请注意,我没有使用SimpleXML,因为imho DOMDocument只是太好了,而且更强大了。自PHP5起,DOMDocumentDOMXPath都可用。

我创建了一个简单的解析器类,它解析提供的文档以获得不同样式的条件:

class AIMLParser
{
    public function parse($data)
    {
        $internalErrors = libxml_use_internal_errors(true);

        $dom = new DOMDocument();
        $dom->loadHTML($data);
        $xpath = new DOMXPath($dom);

        $templates = array();

        foreach($xpath->query('//template') as $templateNode) {
            $template = array(
                'text' => $templateNode->firstChild->nodeValue, // note this expects the first child note to always be the textnode
                'conditions' => array(),
            );

            foreach ($templateNode->getElementsByTagName('condition') as $condition) {
                if ($condition->hasAttribute('name') && $condition->hasAttribute('value')) {
                    $template['conditions'] = $this->parseConditionsWithoutChildren($template['conditions'], $condition);
                } elseif ($condition->hasAttribute('name')) {
                    $template['conditions'] = $this->parseConditionsWithNameAttribute($template['conditions'], $condition);
                } else {
                    $template['conditions'] = $this->parseConditionsWithoutAttributes($template['conditions'], $condition);
                }
            }

            $templates[] = $template;
        }

        libxml_use_internal_errors($internalErrors);

        return $templates;
    }

    private function parseConditionsWithoutChildren(array $conditions, DOMNode $condition)
    {
        if (!array_key_exists($condition->getAttribute('name'), $conditions)) {
            $conditions[$condition->getAttribute('name')] = array();
        }

        $conditions[$condition->getAttribute('name')][$condition->getAttribute('value')] = $condition->nodeValue;

        return $conditions;
    }

    private function parseConditionsWithNameAttribute(array $conditions, DOMNode $condition)
    {
        if (!array_key_exists($condition->getAttribute('name'), $conditions)) {
            $conditions[$condition->getAttribute('name')] = array();
        }

        foreach ($condition->getElementsByTagName('li') as $listItem) {
            $conditions[$condition->getAttribute('name')][$listItem->getAttribute('value')] = $listItem->nodeValue;
        }

        return $conditions;
    }

    private function parseConditionsWithoutAttributes(array $conditions, DOMNode $condition)
    {
        foreach ($condition->getElementsByTagName('li') as $listItem) {
            if (!array_key_exists($listItem->getAttribute('name'), $conditions)) {
                $conditions[$listItem->getAttribute('name')] = array();
            }

            $conditions[$listItem->getAttribute('name')][$listItem->getAttribute('value')] = $listItem->nodeValue;
        }

        return $conditions;
    }
}

它的作用是在文档中搜索template个节点并循环遍历它们。对于每个template节点,它会找出条件的样式。基于它,它选择条件的正确解析函数。循环遍历所有模板后,它返回一个解析后的数组,其中包含您需要的所有信息。

要解析一些文档,你可以这样做:

$parser = new AIMLParser();
$templates = $parser->parse($someVariableWithTheContentOfTheDocument);

演示:http://codepad.viper-7.com/JPuBaE