Php Xpath选择rowspan

时间:2014-09-19 09:53:55

标签: php xpath

我在一个网站上有这个表。在php中使用表xpath。我想从表中获取信息,并将基础放在OpenCart中某些产品的属性上。

 <table border="0" width="100%" style="float:left">
        <tbody>
            <tr>
                <td rowspan="2" class="gr">Dimensiuni</td>
                <td class="c3">Dimensiuni (W x D x H mm):</td>
                <td class="c4">138.5 x 70.9 x 8.9 mm</td>
            </tr>
            <tr>
                <td class="c3">Greutate (g):</td>
                <td class="c4">143 g</td>
            </tr>
            <tr>
                <td rowspan="3" class="gr">Display</td>
                <td class="c3">Dimensiune Display (inches):</td>
                <td class="c4">5.2</td>
            </tr>
            <tr>
                <td class="c3">Rezolutie (pixeli):</td>
                <td class="c4">1080 x 1920 pixels, 5.2 inches (~424 ppi pixel density)</td>
            </tr>
            <tr>
                <td class="c3">Culori:</td>
               <td class="c4">16M colors</td>
            </tr>


        </tbody>
    </table>

我想从表中获取信息并形成一个表格

的数组
Array(
   [Dimensiuni] => array(
           [Dimensiuni (W x D x H mm)] => 138.5 x 70.9 x 8.9 mm,
           [Greutate (g)] => 143 g
   )
   [Display] => array(
              [Dimensiune Display (inches)]  => 5.2,
              [Rezolutie (pixeli)] => 1080 x 1920 pixels, 5.2 inches (~424 ppi pixel density),
              .
              .
              .
   )
)

我来到这里,我被困住了。

$attributeQuery = $xpath->query("//table[@border='0'][@width='100%'][@style='float:left']//td[@class='gr']");
                if($attributeQuery->length > 0){
                    foreach($attributeQuery as $attribute){
                        $attr[$attribute->nodeValue] = array();
                    }
                }

这张表是动态的,我想要一些通用。

2 个答案:

答案 0 :(得分:0)

不确定这是否适合您,但您可以尝试更改

foreach($attributeQuery as $attribute){
                    $attr[$attribute->nodeValue] = array();
                }  

进入

foreach ($attributeQuery->attributes as $attr) {
            $array['@'.$attr->localName] = $attr->nodeValue;
        } 

参考:http://php.net/manual/en/class.domnode.php#115448

答案 1 :(得分:0)

您基本上是在寻找具有<td>属性的rowspan元素来获取该部分。

这可以通过遍历行来实现,只设置当前行中可用的部分,然后只要它再次可用就保留它:

// initialize section
$section = null;
foreach ($table->getElementsByTagName('tr') as $row) {

    // sec section only when found
    $sectionTd = $xpath->evaluate('self::tr/td[@rowspan]', $row);
    if ($sectionTd->length) {
        $section = $sectionTd->item(0)->nodeValue;
    }

    ...

    printf("%s - %s %s\n", $section, $name, $value);
}

示例性输出:

Dimensiuni - Dimensiuni (W x D x H mm): 138.5 x 70.9 x 8.9 mm
Dimensiuni - Greutate (g): 143 g
Display - Dimensiune Display (inches): 5.2
Display - Rezolutie (pixeli): 1080 x 1920 pixels, 5.2 inches (~424 ppi pixel density)
Display - Culori: 16M colors

另一个替代方法是直接使用xpath查找带有rowspan的<td>元素,对于相反的情况,当找不到它时,使用前面的第一个元素:

(
     self::tr[td/@rowspan]
    |self::tr[not(td/@rowspan)]/preceding-sibling::tr[td/@rowspan][1]
)/td

这可以在循环之前初始化$section变量,因此它更自包含:

foreach ($table->getElementsByTagName('tr') as $row) {
    $section = $xpath->evaluate(
        'string((self::tr[td/@rowspan]|self::tr[not(td/@rowspan)]/preceding-sibling::tr[td/@rowspan][1])/td)', $row
    );

    ...

这再次给出相同的输出:

Dimensiuni - Dimensiuni (W x D x H mm): 138.5 x 70.9 x 8.9 mm
Dimensiuni - Greutate (g): 143 g
Display - Dimensiune Display (inches): 5.2
Display - Rezolutie (pixeli): 1080 x 1920 pixels, 5.2 inches (~424 ppi pixel density)
Display - Culori: 16M colors

这里是完整的示例代码:

<?php
$html
    = <<<HTML
 <table border="0" width="100%" style="float:left">
        <tbody>
            <tr>
                <td rowspan="2" class="gr">Dimensiuni</td>
                <td class="c3">Dimensiuni (W x D x H mm):</td>
                <td class="c4">138.5 x 70.9 x 8.9 mm</td>
            </tr>
            <tr>
                <td class="c3">Greutate (g):</td>
                <td class="c4">143 g</td>
            </tr>
            <tr>
                <td rowspan="3" class="gr">Display</td>
                <td class="c3">Dimensiune Display (inches):</td>
                <td class="c4">5.2</td>
            </tr>
            <tr>
                <td class="c3">Rezolutie (pixeli):</td>
                <td class="c4">1080 x 1920 pixels, 5.2 inches (~424 ppi pixel density)</td>
            </tr>
            <tr>
                <td class="c3">Culori:</td>
               <td class="c4">16M colors</td>
            </tr>
        </tbody>
    </table>
HTML;

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);


/** @var DOMElement $table */
$table = $doc->getElementsByTagName('table')->item(0);

foreach ($table->getElementsByTagName('tr') as $row) {
    $section = $xpath->evaluate(
        'string((self::tr[td/@rowspan]|self::tr[not(td/@rowspan)]/preceding-sibling::tr[td/@rowspan][1])/td)', $row
    );
    $name    = $xpath->evaluate('string(./td[@class="c3"])', $row);
    $value   = $xpath->evaluate('string(./td[@class="c4"])', $row);
    printf("%s - %s %s\n", $section, $name, $value);
}

我离开创建数组作为一个小练习,因为这个部分现在在迭代中已知,这应该更容易。