PHP从表中删除链接

时间:2015-05-26 04:25:29

标签: php web-scraping

如何从表中只获得一个链接?

<table>
        <tr class="title">
        <td width="40%">a </td>
        <td width="40%">b</td>
        <td width="10%">c</td>
        <td width="10%">d</td>
        </tr>
        <tr>
        <td>abc.com</td>
        <td>123.123.526.12</td>
        <td><a class="update" href="fruit/grape"</a></td>
        <td><a class="delete" href="fruit/grape"></a></td>
        <td> </td>
        </tr>
        <tr>
        <td>bcd.com</td>
        <td>123.256.33.123</td>
        <td><a class="update" href="fruit/apple"></a></td>
        <td><a class="delete" href="fruit/apple"></a></td>
        <td> </td>
        </tr>
        </table>

我的代码:

$html_doc = new DOMDocument;
libxml_use_internal_errors(true);
$html_doc->loadHTML($html);
libxml_clear_errors();
$html_xpath = new DOMXPath($html_doc);

$link1 = $html_xpath->query('//table/tr[not(contains(@class,"title"))]');
foreach($link1 as $a)
{   
    $bac = $a->nodeValue;
    echo $bac."<br>";
    $rows = $a->getElementsByTagName("a");
    foreach ($rows as $row)
    {
        echo $row->getAttribute("href")."<br>";
    }
}

输出:

 abc.com 123.123.526.12
    fruit/grape
    fruit/grape
    bcd.com 123.256.33.123
    fruit/apple
    fruit/apple

上面的代码向我返回2 href属性。我的预期输出是每行的一个href属性。

我的预期输出:

        abc.com 123.123.526.12
        fruit/grape
        bcd.com 123.256.33.123
        fruit/apple

我该怎么做才能符合我的预期输出?

1 个答案:

答案 0 :(得分:1)

那是因为你回应每个锚。您可以将它们放在一个数组中,并检查您是否已经收集了该链接:

$all_links = array();

foreach($link1 as $a)
{   
    $bac = $a->nodeValue;
    $all_links[$bac] = array();
    $rows = $a->getElementsByTagName("a");

    foreach ($rows as $row)
    {
        $href = $row->getAttribute("href");
        if (!in_array($href, $all_links[$bac])) {
            $all_links[$bac][] = $href;
        }
    }
}

链接到小提琴:http://phpfiddle.org/main/code/e9y3-23gw

我的输出:

Array
(
    [abc.com        123.123.526.12] => Array
        (
            [0] => fruit/grape
        )

    [bcd.com        123.256.33.123] => Array
        (
            [0] => fruit/apple
        )

)