从HTML文本中解析并提取数据

时间:2014-12-24 19:10:12

标签: php html

我将以下HTML文本存储在变量$ domText

<TR class="tableclass">
  <TD>Veteran Job Information</TD>
  <TD>9.00</TD>
  <TD>1.2</TD>
  <TD><INPUT type = "text" name = "notes"></TD>
</TR>

我想检查First(此处为“Veteran Job ...”)中的Text内容是否等于“Benefit Job”,然后需要将值存储在第二个和第三个(此处为9.00和1.2)标记中PHP变量。

以下是我试图做的,但我收到错误,我的代码根本不起作用。 “为foreach()提供的参数无效”

        $dom_ChangeResults = new DOMDocument();
        $dom_ChangeResults->loadHTML($domText); //Load the current changes as HTML String
        $dom_TableTags = $dom_ChangeResults->getElementsByTagName("TR"); //Check table data tags for Full time to PartTime Change
        $rows = $dom_TableTags->item(0)->getElementsByTagName('TD');

        /*** loop over the table rows ***/
        foreach ($rows as $row)
        {
            /*** get each column by tag name ***/
            $cols = $row->getElementsByTagName('td');
            /*** echo the values ***/
            echo $cols->item(0)->nodeValue.'<br />';
            echo $cols->item(1)->nodeValue.'<br />';
            echo $cols->item(2)->nodeValue;
            echo '<hr />';
        }

2 个答案:

答案 0 :(得分:0)

您应该遍历<tr>个元素,而不是<td>

$dom_ChangeResults = new DOMDocument();
$dom_ChangeResults->loadHTML($domText); //Load the current changes as HTML String
$rows = $dom_ChangeResults->getElementsByTagName("tr");

/*** loop over the table rows ***/
foreach ($rows as $row) {
    /*** get each column by tag name ***/
    $cols = $row->getElementsByTagName('td');
    /*** echo the values ***/
    echo $cols->item(0)->nodeValue.'<br />';
    echo $cols->item(1)->nodeValue.'<br />';
    echo $cols->item(2)->nodeValue;
    echo '<hr />';
}

答案 1 :(得分:0)

不要在PHP DOMElement中使用大写标签。不知道这是否是您给我们的所有代码,但PHP脚本中的大写是主要问题,getElementsByTagName('TD')将返回空列表,其中getElementsByTagName('td')将返回填充列表。

    $dom_TableTags = $dom_ChangeResults->getElementsByTagName("TR"); //Check table data tags for Full time to PartTime Change
    $rows = $dom_TableTags->item(0)->getElementsByTagName('TD');

不应该(或者你只有一行?):

    $dom_TableTags = $dom_ChangeResults->getElementsByTagName("table"); //Check table data tags for Full time to PartTime Change
    $rows = $dom_TableTags->item(0)->getElementsByTagName('tr');

here正在使用示例代码

$domText = <<<DOM
        <TABLE>
        <TR class="tableclass">
          <TD>Veteran Job Information</TD>
          <TD>9.00</TD>
          <TD>1.2</TD>
          <TD><INPUT type = "text" name = "notes"></TD>
        </TR>
        <TR class="tableclass">
          <TD>Veteran Job Information</TD>
          <TD>9.00</TD>
          <TD>1.2</TD>
          <TD><INPUT type = "text" name = "notes"></TD>
        </TR>
        <TR class="tableclass">
          <TD>Veteran Job Information</TD>
          <TD>9.00</TD>
          <TD>1.2</TD>
          <TD><INPUT type = "text" name = "notes"></TD>
        </TR>
        </TABLE>
DOM;

    $dom_ChangeResults = new DOMDocument();
    $dom_ChangeResults->loadHTML($domText); //Load the current changes as HTML String
    $dom_TableTags = $dom_ChangeResults->getElementsByTagName("table"); //Check table data tags for Full time to PartTime Change
    $rows = $dom_TableTags->item(0)->getElementsByTagName('tr');

    /*** loop over the table rows ***/
    foreach ($rows as $row)
    {
            /*** get each column by tag name ***/
            $cols = $row->getElementsByTagName('td');
            /*** echo the values ***/
            echo $cols->item(0)->nodeValue.'<br />';
            echo $cols->item(1)->nodeValue.'<br />';
            echo $cols->item(2)->nodeValue;
            echo '<hr />';
    }

修改

使用一个<tr>元素处理数据:

$dom_ChangeResults = new DOMDocument();
$dom_ChangeResults->loadHTML($domText); //Load the current changes as HTML String
//$dom_TableTags = $dom_ChangeResults->getElementsByTagName("tr"); //Check table data tags for Full time to PartTime Change
$rows = $dom_ChangeResults->getElementsByTagName('tr');