如何使用简单的html dom php查找特定数据

时间:2015-11-12 09:28:24

标签: php html simple-html-dom

当我刮表时,表tr和td值正在变化。下面是原始表格。

<table class="scoretable">
<tbody>
<tr><td class="jdhead">Name</td><td class="fullhead">John</td></tr>
<tr><td class="jdhead">Age</td><td class="fullhead">30</td></tr>
<tr><td class="jdhead">Phone</td><td class="fullhead">91234988788</td></tr>
<tr><td class="jdhead">Location</td><td class="fullhead">Madrid</td></tr>
<tr><td class="jdhead">Country</td><td class="fullhead">Spain</td></tr>
<tr><td class="jdhead">Role</td><td class="fullhead">Manager</td></tr>
</tbody>
</table>

<table class="scoretable">
<tbody>
<tr><td class="jdhead">Name</td><td class="fullhead">John</td></tr>
<tr><td class="jdhead">Age</td><td class="fullhead">30</td></tr>
<tr><td class="jdhead">Phone</td><td class="fullhead">91234988788</td></tr>
<tr><td class="jdhead">Role</td><td class="fullhead">Manager</td></tr>
</tbody>
</table>

以上两张表来自不同的页面。我需要抓名字,电话和角色。

$url = "http://name.com/listings";
$html = file_get_html( $url );

$posts1 = $html->find('td[class=fullhead]',1);

foreach ( $posts1 as $post1 ) {
    $poster1 = $post1->outertext;
    echo $poster1;
    }

2 个答案:

答案 0 :(得分:1)

我会尝试session.loggedIn = false来自HTML的所需值,如下所示:

preg_match

更新(见下面的评论):

<?php
$url = 'http://name.com/listings';
$html = file_get_contents($url);

if (preg_match('~<tr><td class="jdhead">Name</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you name   
}

if (preg_match('~<tr><td class="jdhead">Phone</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you phone  
}

if (preg_match('~<tr><td class="jdhead">Role</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you role   
}

答案 1 :(得分:0)

I have this solution which works with you example:

<?php
// load
$doc = new DOMDocument();
$doc->loadHTMLFile("tabledata.html");

// required nodes
$required_data = ['Name', 'Phone', 'Role'];

$tbody_elements = $doc->getElementsByTagName('tbody');

// xpath object
$xpath = new DOMXPath($doc);

// array for final data
$finaldata = [];
// each tr is one user
foreach($tbody_elements as $key => $tbody)
{
    // iterate though the required data
    foreach($required_data as $data)
    {
        $return = $xpath->query("tr[td[text()='$data']]", $tbody);

        foreach($return as $node)
        {
            $finaldata[$key][$data] = $node->textContent;
        }
    }
}

Outputs:

array(2) {
  [0]=>
  array(3) {
    ["Name"]=>
    string(8) "NameJohn"
    ["Phone"]=>
    string(16) "Phone91234988788"
    ["Role"]=>
    string(11) "RoleManager"
  }
  [1]=>
  array(3) {
    ["Name"]=>
    string(8) "NameJohn"
    ["Phone"]=>
    string(16) "Phone91234988788"
    ["Role"]=>
    string(11) "RoleManager"
  }
}