正如标题所示,我有一个关于解析可能具有多个属性(或根本没有)的XML标记的问题,我正在寻找关于如何实现这一点的建议;但首先,我认为有点背景。
我正在开发一个名为AIML的基于PHP的Program O解释器脚本,我正在将代码从字符串替换函数(例如str_replace,preg_replace等)迁移到使用PHP的内置SimpleXML函数。到目前为止,我为各种AIML标签创建的几乎所有解析函数都是完整的,并且工作得很好,但是一个标签特别是踢我的座位温暖,那就是CONDITION标签。
根据AIML tag reference,标签有三个单独的“形式”:一个同时具有NAME和(VALUE | CONTAINS | EXISTS)属性,称为“多条件”,一个只有NAME属性,称为“单一名称列表条件”,最后的“形式”,称为“列表条件”,它只是CONDITION标签,根本没有属性。我之前链接的AIML标签参考有三个表格的例子,但中间有很多单词,所以我将在这里重复它们,与周围的AIML代码相关:
FORM:多条件标签:
<category>
<pattern>I AM BLOND</pattern>
<template>You sound very
<condition name="gender" value="female"> attractive.</condition>
<condition name="gender" value="male"> handsome.</condition>
</template>
</category>
FORM:list-condition标签:
<category>
<pattern>I AM BLOND</pattern>
<template>You sound very
<condition>
<li name="gender" value="female"> attractive.</li>
<li name="gender" value="male"> handsome.</li>
</condition>
</template>
</category>
FORM:单个名称列表条件标记
<category>
<pattern>I AM BLOND</pattern>
<template>You sound very
<condition name="gender">
<li value="female"> attractive.</li>
<li value="male"> handsome.</li>
</condition>
</template>
</category>
在我正在处理的以前版本的脚本中,只使用了CONDITION标签的“list-condition”形式,虽然这是最常用的形式,但它并不是专门用的,所以我需要能够适应其他两种形式。所以我的问题是:
如何以有效的方式实现这一目标?
我已经有了工作代码来解析CONDITION标签的list-condition形式,并且prelimary测试看起来很有希望,因为它不会抛出任何错误,并且似乎产生了所需的响应(但仅限于list-condition形式。其他2个表单因错误而失败,原因很明显。该功能列在下面:
function parse_condition_tag($convoArr, $element, $parentName, $level)
{
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Starting function and setting timestamp.', 2);
$response = array();
$attrName = $element['name'];
if (!empty ($attrName))
{
$attrName = ($attrName == '*') ? $convoArr['star'][1] : $attrName;
$search = $convoArr['client_properties'][$attrName];
$path = ($search != 'undefined') ? "//li[@value=\"$search\"]" : '//li[not@*]';
$choice = $element->xpath($path);
$children = $choice[0]->children();
if (!empty ($children))
{
$response = parseTemplateRecursive($convoArr, $children, $level + 1);
}
else
{
$response[] = (string) $choice[0];
}
$response_string = implode_recursive(' ', $response, __FILE__, __FUNCTION__, __LINE__);
runDebug(__FILE__, __FUNCTION__, __LINE__, "Returning '$response_string' and exiting function.", 4);
return $response_string;
}
trigger_error('Parsing of the CONDITION tag failed! XML = ' . $element->asXML());
}
我使用SimpleXML函数还是比较新的,所以我可能会遗漏一些明显的东西。事实上,我希望情况确实如此。 :)
编辑:按照我的一条评论中的承诺,添加我最终最终得到的功能:
/*
* function parse_condition_tag
* Acts as a de-facto if/else structure, selecting a specific output, based on certain criteria
* @param [array] $convoArr - The conversation array (a container for a number of necessary variables)
* @param [object] $element - The current XML element being parsed
* @param [string] $parentName - The parent tag (if applicable)
* @param [int] $level - The current recursion level
* @return [string] $response_string
*/
function parse_condition_tag($convoArr, $element, $parentName, $level)
{
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Starting function and setting timestamp.', 2);
global $error_response;
$response = array();
$attrName = $element['name'];
$attributes = (array)$element->attributes();
$attributesArray = (isset($attributes['@attributes'])) ? $attributes['@attributes'] : array();
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Element attributes:' . print_r($attributesArray, true), 1);
$attribute_count = count($attributesArray);
runDebug(__FILE__, __FUNCTION__, __LINE__, "Element attribute count = $attribute_count", 1);
if ($attribute_count == 0) // Bare condition tag
{
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with no attributes. XML = ' . $element->asXML(), 2);
$liNamePath = 'li[@name]';
$condition_xPath = '';
$exclude = array();
$choices = $element->xpath($liNamePath);
foreach ($choices as $choice)
{
$choice_name = (string)$choice['name'];
if (in_array($choice_name, $exclude)) continue;
$exclude[] = $choice_name;
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Client properties = ' . print_r($convoArr['client_properties'], true), 2);
$choice_value = get_client_property($convoArr, $choice_name);
$condition_xPath .= "li[@name=\"$choice_name\"][@value=\"$choice_value\"]|";
}
$condition_xPath .= 'li[not(@*)]';
runDebug(__FILE__, __FUNCTION__, __LINE__, "xpath search = $condition_xPath", 4);
$pick_search = $element->xpath($condition_xPath);
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Pick array = ' . print_r($pick_search, true), 2);
$pick_count = count($pick_search);
runDebug(__FILE__, __FUNCTION__, __LINE__, "Pick count = $pick_count.", 2);
$pick = $pick_search[0];
}
elseif (array_key_exists('value', $attributesArray) or array_key_exists('contains', $attributesArray) or array_key_exists('exists', $attributesArray)) // condition tag with either VALUE, CONTAINS or EXISTS attributes
{
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with 2 attributes.', 2);
$condition_name = (string)$element['name'];
$test_value = get_client_property($convoArr, $condition_name);
switch (true)
{
case (isset($element['value'])):
$condition_value = (string)$element['value'];
break;
case (isset($element['value'])):
$condition_value = (string)$element['value'];
break;
case (isset($element['value'])):
$condition_value = (string)$element['value'];
break;
default:
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Something went wrong with parsing the CONDITION tag. Returning the error response.', 1);
return $error_response;
}
$pick = ($condition_value == $test_value) ? $element : '';
}
elseif (array_key_exists('name', $attributesArray)) // this ~SHOULD~ just trigger if the NAME value is present, and ~NOT~ NAME and (VALUE|CONTAINS|EXISTS)
{
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with only the NAME attribute.', 2);
$condition_name = (string)$element['name'];
$test_value = get_client_property($convoArr, $condition_name);
$path = "li[@value=\"$test_value\"]|li[not(@*)]";
runDebug(__FILE__, __FUNCTION__, __LINE__, "search string = $path", 4);
$choice = $element->xpath($path);
$pick = $choice[0];
runDebug(__FILE__, __FUNCTION__, __LINE__, 'Found a match. Pick = ' . print_r($choice, true), 4);
}
else // nothing matches
{
runDebug(__FILE__, __FUNCTION__, __LINE__, 'No matches found. Returning default error response.', 1);
return $error_response;
}
$children = (is_object($pick)) ? $pick->children() : null;
if (!empty ($children))
{
$response = parseTemplateRecursive($convoArr, $children, $level + 1);
}
else
{
$response[] = (string) $pick;
}
$response_string = implode_recursive(' ', $response);
return $response_string;
}
我怀疑可能有一种更好,更优雅的方式来做这件事(我的生活故事,真的),但上面的工作是按照预期的。任何改进建议都将被感激,并经过仔细考虑。
答案 0 :(得分:0)
请注意,我没有使用SimpleXML
,因为imho DOMDocument
只是太好了,而且更强大了。自PHP5起,DOMDocument
和DOMXPath
都可用。
我创建了一个简单的解析器类,它解析提供的文档以获得不同样式的条件:
class AIMLParser
{
public function parse($data)
{
$internalErrors = libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$templates = array();
foreach($xpath->query('//template') as $templateNode) {
$template = array(
'text' => $templateNode->firstChild->nodeValue, // note this expects the first child note to always be the textnode
'conditions' => array(),
);
foreach ($templateNode->getElementsByTagName('condition') as $condition) {
if ($condition->hasAttribute('name') && $condition->hasAttribute('value')) {
$template['conditions'] = $this->parseConditionsWithoutChildren($template['conditions'], $condition);
} elseif ($condition->hasAttribute('name')) {
$template['conditions'] = $this->parseConditionsWithNameAttribute($template['conditions'], $condition);
} else {
$template['conditions'] = $this->parseConditionsWithoutAttributes($template['conditions'], $condition);
}
}
$templates[] = $template;
}
libxml_use_internal_errors($internalErrors);
return $templates;
}
private function parseConditionsWithoutChildren(array $conditions, DOMNode $condition)
{
if (!array_key_exists($condition->getAttribute('name'), $conditions)) {
$conditions[$condition->getAttribute('name')] = array();
}
$conditions[$condition->getAttribute('name')][$condition->getAttribute('value')] = $condition->nodeValue;
return $conditions;
}
private function parseConditionsWithNameAttribute(array $conditions, DOMNode $condition)
{
if (!array_key_exists($condition->getAttribute('name'), $conditions)) {
$conditions[$condition->getAttribute('name')] = array();
}
foreach ($condition->getElementsByTagName('li') as $listItem) {
$conditions[$condition->getAttribute('name')][$listItem->getAttribute('value')] = $listItem->nodeValue;
}
return $conditions;
}
private function parseConditionsWithoutAttributes(array $conditions, DOMNode $condition)
{
foreach ($condition->getElementsByTagName('li') as $listItem) {
if (!array_key_exists($listItem->getAttribute('name'), $conditions)) {
$conditions[$listItem->getAttribute('name')] = array();
}
$conditions[$listItem->getAttribute('name')][$listItem->getAttribute('value')] = $listItem->nodeValue;
}
return $conditions;
}
}
它的作用是在文档中搜索template
个节点并循环遍历它们。对于每个template
节点,它会找出条件的样式。基于它,它选择条件的正确解析函数。循环遍历所有模板后,它返回一个解析后的数组,其中包含您需要的所有信息。
要解析一些文档,你可以这样做:
$parser = new AIMLParser();
$templates = $parser->parse($someVariableWithTheContentOfTheDocument);