如何循环DOM元素并存储为数组?

时间:2017-07-28 05:18:38

标签: php arrays web-scraping domdocument

我通过报废获取数据。 数据源是一个表,我需要获取每个(tr)的数据。

该表有3(td),即:

  • 标题
  • 日期
  • 链接

以下是我使用的代码:

$data = array();
$counter = 1;
$index = 0;

foreach($html->find('#middle table tr td') as $source){

    $dont_include = array(
        '<td>CONTAIN TEXT THAT I DONT WNAT TO INCLUDE IN HERE</td>'
    );

    if (!in_array($source->outertext, $dont_include)) {

        // IF IT CONTAIN LINK THEN GET IT LINK
        // THE SOURCE DATA FOR LINK IS SOMETHING LIKE 
        // <td><a href="">xx</a></td>
        if(strstr($source->innertext, 'http://')){

               $a = new SimpleXMLElement($source->innertext);

               $the_link = (string) $a['href'][0];
               $data[$index] = array('link' => $the_link);;
        }else{
            if ($counter==2) {
                $data[$index] = array('title' => $source->innertext);
            }else{
                $data[$index] = array('date' => $source->innertext);
                $counter = 0;
                $index++;
            }
        }
    }
    $counter++;
}

print_r($data);

问题: 如何使用以下结构将这些值存储在数组中:

Array (
    [0] => Array (
        [title] => ""
        [date] => ""
        [link] => ""
    )
    [1] => Array (
        [title] => ""
        [date] => ""
        [link] => ""
    )
    ...
)

更新,这是源结构:

    <!-- THIS IS THE SOURCE , AT THE TOP HERE CONTAIN TD THAT I DONT WANT -->
    <td>title</td>
    <td class="ac">date</td>
    <td width="190"><a href="i need this link" target="_blank">filename , i dont need the file name</a>
    </td>
<td>title</td>
    <td class="ac">date</td>
    <td width="190"><a href="i need this link" target="_blank">filename , i dont need the file name</a>
    </td>
<td>title</td>
    <td class="ac">date</td>
    <td width="190"><a href="i need this link" target="_blank">filename , i dont need the file name</a>
    </td>
<td>title</td>
    <td class="ac">date</td>
    <td width="190"><a href="i need this link" target="_blank">filename , i dont need the file name</a>
    </td>

1 个答案:

答案 0 :(得分:1)

而不是遍历td我建议您循环浏览tr,以便创建数组。试试这个

$rowData = array();

foreach ($html->find('#middle table tr') as $rows) {
    $cellData = array();

    $cellData['title'] = $rows->children(0)->innertext;
    $cellData['date'] = $rows->children(1)->innertext;
    $cellData['link'] = $rows->children(2)->innertext;

    $rowData[] = $cellData;
}
print_r($rowData);