如何使用XPath从HTML文档中提取属性对?

时间:2012-09-04 02:49:08

标签: php dom xpath attributes

给定的HTML文档包含如下形式:

<form>
    <div controlType="yyy1" xmlTag="zzz1">...</div>
    <div controlType="yyy2" xmlTag="zzz2">...</div>
</form>

我需要收集这些数据:

$div[0]      = array('yyy1', 'zzz1');
$div[1]      = array('yyy2', 'zzz2');

每个controlType元素所需的属性对xmlTagdiv

4 个答案:

答案 0 :(得分:1)

评估这两个XPath表达式

/form/div[$k]/@controlType

/form/div[$k]/@xmlTag

填充$div[$k -1]

其中$k必须替换为数字,1,2,...,count(/form/div)

可能想要将上面的两个表达式组合成一个XPath表达式:

/form/div[$k]/@*

然而,允许XPath的实现以任何顺序返回属性(XPath不定义属性之间的排序)并且不清楚两个属性中的哪一个首先出现在所选节点中,哪个属于第二个属性

答案 1 :(得分:0)

我的两分钱,如果它有帮助

            var doc = '<form xmltag="xxx"><div controltype="yyy1" xmltag="zzz1">...</div><div controltype="yyy2" xmltag="zzz2">...</div></form>';

        var result = [];

        $(doc).children().each(function () {
            var ctrl = $(this);
            if (ctrl.is('div')) {
                result.push([ctrl.attr('controlType'), ctrl.attr('xmlTag')]);
            }
        });

答案 2 :(得分:0)

@$url = "http://XXX.xom"
$path     = "//div[@class='sb_tlst']//a";
$contents = get_contents($url, $path);
foreach ($contents as $value) 
{ 
    /* do something */
}

答案 3 :(得分:0)

我的最终解决方案基于@ dimitre-novatchev的优秀创意提案:

$res             = $xpath->query("//form//div/@xmltag"); // OBS: xmltag not xmlTag
$total_fields    = $res->length;

for ($i = 1; $i <= $total_fields; $i ++ )
{
    $r       = $xpath->query("//form//div[$i]/@xmltag");
    $xmltag  = $r->item(0)->value;

    $r           = $xpath->query("//form//div[$i]/@controltype");
    $controltype = $r->item(0)->value;

    $div[$i - 1] = array(
        'xmltag'         => $xmltag,
        'controltype'    => $controltype
    );
}

输出样本:

array (
  0 => 
  array (
    'xmltag' => 'Case_Number',
    'controltype' => '',
  ),
  1 => 
  array (
    'xmltag' => 'Plaintiff',
    'controltype' => 'RadioButtons',
  ),
  2 => 
  array (
    'xmltag' => 'Plaintiff_Name',
    'controltype' => '',
  ),

美丽!