如何使用简单的html DOM刮掉这两个表?

时间:2013-02-08 09:00:09

标签: php html dom scrape

我一直在试图弄清楚如何使用php简单的html DOM来抓取td class="job"及其各自的薪水。我可以通过id或class找到并刮掉div没有问题,但我不确定如何攻击像这样的表。任何帮助将不胜感激!

<table cellpadding="0" cellspacing="0" border="0" class="table01">
<tr>
    <td class="head">Test</td>
    <td class="job">
    <a href="/Illustrator" id="UniqueID1">Illustrator</a><br/>
    $23,729 - $95,429
    </td>
</tr>
<tr>
    <td class="head">Test</td>
    <td class="job">
    <a href="/Small_Business_Owner_%2f_Operator" id="UniqueID2">Small Business Owner / Operator</a><br/>
    $24,369 - $174,991
    </td>
</tr>
<tr>
    <td class="head">Test</td>
    <td class="job">
    <a href="/Waiter%2fWaitress" id="UniqueID3">Waiter/Waitress</a><br/>
    $7,483 - $34,188
    </td>
</tr>
</table>

<table cellpadding="0" cellspacing="0" border="0" class="table02">
<tr>
    <td class="head">Test</td>
    <td class="job" style="padding-right: 20px">
    <a href="/Graphic_Artist_%2f_Designer" id="UniqueID1">Graphic Artist / Designer</a><br/>
    $23,789 - $55,409
    </td>
</tr>
<tr>
    <td class="head">Test</td>
    <td class="job" style="padding-right: 20px">
    <a href="/Illustrator" id="UniqueID2">Illustrator</a><br/>
    $23,729 - $95,429
    </td>
</tr>    
<tr>
    <td class="head">Test</td>
    <td class="job" style="padding-right: 20px">
    <a href="/Art_Director" id="UniqueID3">Art Director</a><br/>
    $34,160 - $85,943
    </td>
</tr>
</table>

1 个答案:

答案 0 :(得分:2)

    $dom = new DOMDocument();
    $html = "your html data";
    // load html
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom);

    //this will gives you all td with class name is jobs.
    $my_xpath_query = "//table//td[contains(@class, 'job')]";
    $result_rows = $xpath->query($my_xpath_query);

    //iterate all td
    foreach ($result_rows as $result_object){
        echo $result_object->nodeValue;
    }