Goutte - 获取表格列

时间:2017-03-21 03:29:02

标签: php goutte

如何只获得一列不是整张表?

<table cellspacing="0" cellpadding="0" align="Center" rules="all" border="1">
    <tbody>
    <tr>
        <td>Entity Name</td>
        <td>NV Business ID</td>
        <td>Status</td>
        <td>Type</td>
    </tr>
    <tr>
        <td><a href="">GOOGLE</a></td>
        <td><a href=""></a></td>
        <td><a href="">Expired</a></td>
        <td><a href="">Reserved Name</a></td>
    </tr>
    <tr>
        <td><a href="">GOOGLE INC.</a></td>
        <td><a href="">NV20161275322</a></td>
        <td><a href="">Active</a></td>
        <td><a href="">Foreign Corporation</a>
        </td>
    </tr>
    </tbody>
</table>

这是我的尝试:

        $client = new Client();
        $crawler = $client->request('GET', 'url');
        $form = $crawler->selectButton('Search')->form();
        $crawler = $client->submit($form, array(
            ...
        ));
        $crawler->filter('table tr')->each(function ($node) {
            print $node->text()."\n \n";
//            print $node->filter('td')->text() . '<br />';
        });

它总是返回整个表格。 也试过像tr [1]等......

有人可以帮忙吗?

由于

3 个答案:

答案 0 :(得分:0)

我找到了解决方案:

$node->filter('td')->eq(2)->text();

2表示第三列,因为它是[0,1,2,...]

答案 1 :(得分:0)

您可以使用DOMDocument从HTML获取数据。

PHP code demo

<?php
ini_set("display_errors", 1);
$html = '<table cellspacing="0" cellpadding="0" align="Center" rules="all" border="1">
    <tbody>
    <tr>
        <td>Entity Name</td>
        <td>NV Business ID</td>
        <td>Status</td>
        <td>Type</td>
    </tr>
    <tr>
        <td><a href="">GOOGLE</a></td>
        <td><a href=""></a></td>
        <td><a href="">Expired</a></td>
        <td><a href="">Reserved Name</a></td>
    </tr>
    <tr>
        <td><a href="">GOOGLE INC.</a></td>
        <td><a href="">NV20161275322</a></td>
        <td><a href="">Active</a></td>
        <td><a href="">Foreign Corporation</a>
        </td>
    </tr>
    </tbody>
</table>';
$result=array();
$object= new DOMDocument();
$object->loadHTML($html);
$requiredColumn=3;
$requiredColumn--;
foreach($object->getElementsByTagName("tr") as $value)
{
    $nodelistObject=$value->getElementsByTagName("td");
    $columnCounter=0;
    foreach($nodelistObject as $tdNode)
    {
        if($columnCounter==$requiredColumn)
        {
            if($tdNode->getElementsByTagName("a")->length==0)
            {
                $result[]=$tdNode->textContent;
            }
            foreach($tdNode->getElementsByTagName("a") as $aElement)
            {
                $result[]=$aElement->textContent;
            }
        }
        $columnCounter++;
    }
}
print_r($result);

答案 2 :(得分:0)

请尝试以下代码:

$content  = $crawler->filter( 'table' )->extract( array( '_text' ) );