如果在HTML中没有层次结构时如何使用简单的html dom进行刮擦

时间:2012-10-26 11:59:56

标签: php simple-html-dom

  <td class="cinetime">
    <div>Screen: 3</div>
    <br clear="all">
    <span>1:00</span>
    <span>11:00</span>
    <span>13:00</span>
    <span>15:00</span>
    <br clear="all">
    <div>Screen: 4</div>
    <br clear="all">
    <span>12:05</span>
    <span>14:05</span>
    <span>16:05</span>
    <span>18:05</span>
    <span>20:05</span>
    <div>Screen: 3 (3D)</div>
  </td>

以上是我正在抓取的HTML。

我想将<span>中的数据分别用于屏幕3 屏幕4

这是我的代码,但它需要来自所有<span> s

的数据
foreach($cinema->find("td.cinetime span") as $times) {
    echo $times->plaintext;
}

2 个答案:

答案 0 :(得分:0)

你需要遍历td.cintime中的所有元素并检查文本是否是你需要的(如屏幕:3),然后检查其他元素是否是span,然后将它放在某个数组中,直到你得到另一个div。这意味着一个屏幕块已经结束。重复,直到你得到你需要的一切。

伪代码:

$screensData = array();
$start = false;
foreach($cinema->find("td.cintime *") as $times) {
   if($times->tag == 'div') {
      $start = false;
      if($times->plaintext == 'Screen: 3') {
         $screensData[3] = array();
         $start = 3;
      }
   }

   if($start && $times->tag == 'span') {
      $screensData[$start][] = $times->plaintext;
   }
}

答案 1 :(得分:0)

这样的事情:

$dom = new simple_html_dom();
$dom->load($source);


$cinetime = $dom->find('.cinetime',0);

$screens = array();
$screenName = '';

foreach($cinetime->children() as $elem){
    if($elem->tag == 'div'){
        $screenName = $elem->innertext;
    } else
    if($elem->tag == 'span'){
        $screens[$screenName][] = $elem->innertext;
    }
}
print_r($screens);

这产生了这个:

Array
(
[Screen: 3] => Array
    (
        [0] => 1:00
        [1] => 11:00
        [2] => 13:00
        [3] => 15:00
    )

[Screen: 4] => Array
    (
        [0] => 12:05
        [1] => 14:05
        [2] => 16:05
        [3] => 18:05
        [4] => 20:05
    )

)