在两个表和td行之间刮取内容

时间:2014-12-08 17:06:48

标签: php html scrape

嗨,上周我有人帮我这个,现在我还有另一个选择。我试图捕获两个表之间的信息,然后它有一堆td行。

以下是代码:

<table cellspacing="4" cellpadding="0" width="100%">
  <tr>
    <td><b>DATE</b></td>
    <td>November 15 - January 4, 2015</td>
</tr>
  <tr>
    <td><b>DIRECTIONS</b></td>
    <td>161 Museum Drive, Hershey, PA<br />
        <a href="http://maps.google.com/maps?q=161+Museum+Drive+Hershey+PA" title="Locate Cars and Christmas" target="_blank">
        <img src="/img/usa-motorcycle-rallies.png" alt="Locate Cars and Christmas" border="0" align="left"/></a>
        <font size="2">
        Get a <a href="http://maps.google.com/maps?q=161+Museum+Drive+Hershey+PA" title="Locate Cars and Christmas" target="_blank"><b>Google Map</b></a> of the Area.
        </font>
    </td>
  </tr>

  <tr>
    <td><b>CITY / STATE</b></td>
    <td>Hershey, Pennsylvania (PA)
</td>
  </tr>
  <tr>
    <td><b>DESCRIPTION</b></td>
    <td><p>The&nbsp;Cars and Christmas will be held at the&nbsp;Antique Automobile Club of America (AACA) Museum in Hershey, Pennsylvania from November 15th, 2014 to January 4th, 2015.<br /><br /><b>Location:<br /></b>-Antique Automobile Club of America (AACA) Museum in Hershey, Pennsylvania<br />(161 Museum Drive, Hershey, PA)<br /><br />It’s that time of year again, and this year the Antique Automobile Club of America (AACA) Museum will be festively prepared for the holidays during the Cars and Christmas exhibit, starting November 15 and running until January 4, 2015. There will be a variety of special automobiles on display, including Mr. Beep, the 1959 Pontiac Catalina Safari, our Hess Mobile Museum celebrating 50 years of Hess, the Model Trains, and much more! Enjoy the Pontiac Catalina Safari, this unique rescued and restored answer to the El Camino, the one and only of its kind. Also come explore our new Tucker Exhibit, the world’s largest collection of Tucker vehicles and other Tucker automobilia. Really feeling in the holiday spirit? Help those in need by donating non-perishable food items and toys to our Food and Toy Drive, all located here at the AACA Museum.<br /><br /><b>Please Contact For More Information:<br /></b>-(717) 566-7100<br /><br /><b>We hope to see you there!&nbsp;</b></p><br /></p>            <p class="nou">For all your <a href="http://www.motorcyclemonster.com/events.html">Motorcycle Event</a> information check out the <a href="http://www.motorcyclemonster.com">Motorcycle Monster</a>.</p>
        <p>For more information about this event, Please see below.</p>
    </td>
  </tr>


  <tr>
    <td><b>WEBSITE</b></td>
    <td><a href="http://www.aacamuseum.org/cars-christmas-2014/" title="cars and christmas" target="_self">http://www.aacamuseum.org/cars-christmas-2014/</a>
    </td>
  </tr>
  <tr>
    <td><b>EMAIL</b></td>
    <td>            <a href="mailto:ngates@aacamuseum.org">ngates@aacamuseum.org</a>
    </td>
  </tr>
  <tr>
    <td><b>CONTACT</b></td>
    <td>Nancy Gates</td>
  </tr>
  <tr>
    <td><b>PHONE</b></td>
    <td>717-566-7100
    </td>
  </tr>

这是我提取它的代码:

<?php

include('simple_html_dom.php');
$html = file_get_html('http://www.motorcyclemonster.com/events/cars-and-christmas-2014-11-15-Hershey-PA.html');

//For each table row
$events = array();
foreach($html->find('table',1)->find('tr') as $h){
    $temp = array();
    //get date

    if($date = $h->find('td', 1)) {

        $temp['date'] = $h->find('td', 1)->plaintext; //Inner contents of first cell
        $temp['town'] = $h->find('td', 2)->plaintext;
    }

$events[] = $temp;
        }


print_r($events);
?>

我的结果是:

    Array ( [0] => Array ( [date] => November 15 - January 4, 2015 [town] => ) [1] => Array ( [date] => 161 Museum Drive, Hershey, PA Get a Google Map of the Area. [town] => ) [2] => Array ( [date] => Hershey, Pennsylvania (PA) [town] => ) [3] => Array ( [date] => The Cars and Christmas will be held at the Antique Automobile Club of America (AACA) Museum in Hershey, Pennsylvania from November 15th, 2014 to January 4th, 2015.Location:-Antique Automobile Club of America (AACA) Museum in Hershey, Pennsylvania(161 Museum Drive, Hershey, PA)It’s that time of year again, and this year the Antique Automobile Club of America (AACA) Museum will be festively prepared for the holidays during the Cars and Christmas exhibit, starting November 15 and running until January 4, 2015. There will be a variety of special automobiles on display, including Mr. Beep, the 1959 Pontiac Catalina Safari, our Hess Mobile Museum celebrating 50 years of Hess, the Model Trains, and much more! Enjoy the Pontiac Catalina Safari, this unique rescued and restored answer to the El Camino, the one and only of its kind. Also come explore our new Tucker Exhibit, the world’s largest collection of Tucker vehicles and other Tucker automobilia. Really feeling in the holiday spirit? Help those in need by donating non-perishable food items and toys to our Food and Toy Drive, all located here at the AACA Museum.Please Contact For More Information:-(717) 566-7100We hope to see you there! 
    For all your Motorcycle Event information check out the Motorcycle Monster. For more information about this event, Please see below. [town] => ) [4] => Array ( [date] => http://www.aacamuseum.org/cars-christmas-2014/ [town] => ) [5] => Array ( [date] => ngates@aacamuseum.org [town] => ) [6] => Array ( [date] => Nancy Gates [town] => ) [7] => Array ( [date] => 717-566-7100 [town] => ) [8] => Array ( ) )

任何帮助?

所以我正在寻找的是这样的结果:

Array
(
    [0] =Array
        (
            [date] =November 15 - January 4, 2015
            [directions ] = 161 Museum Drive, Hershey, PA
            [city] =  Hershey 
            [state] = Pennsylvania (PA)
            [discription] = The&nbsp;Cars and Christmas will be held at the&nbsp;Antique Automobile Club of America (AACA) Museum in Hershey, Pennsylvania from November 15th, 2014 to January 4th, 2015.<br /><br /><b>Location:<br /></b>-Antique Automobile Club of America (AACA) Museum in Hershey, Pennsylvania<br />(161 Museum Drive, Hershey, PA)<br /><br />It’s that time of year again, and this year the Antique Automobile Club of America (AACA) Museum will be festively prepared for the holidays during the Cars and Christmas exhibit, starting November 15 and running until January 4, 2015. There will be a variety of special automobiles on display, including Mr. Beep, the 1959 Pontiac Catalina Safari, our Hess Mobile Museum celebrating 50 years of Hess, the Model Trains, and much more! Enjoy the Pontiac Catalina Safari, this unique rescued and restored answer to the El Camino, the one and only of its kind. Also come explore our new Tucker Exhibit, the world’s largest collection of Tucker vehicles and other Tucker automobilia. Really feeling in the holiday spirit? Help those in need by donating non-perishable food items and toys to our Food and Toy Drive, all located here at the AACA Museum.<br /><br /><b>Please Contact For More Information:<br /></b>-(717) 566-7100<br /><br /><b>We hope to see you there!&nbsp;</b></p><br /></p>           <p class="nou">For all your <a href="http://www.motorcyclemonster.com/events.html">Motorcycle Event</a> information check out the <a href="http://www.motorcyclemonster.com">Motorcycle Monster</a>.</p>
            <p>For more information about this event, Please see below.
          [website] = http://www.aacamuseum.org/cars-christmas-2014/
          [email] = ngates@aacamuseum.org
          [contact] = Nancy Gates
          [phone] = 717-566-7100

        )

1 个答案:

答案 0 :(得分:0)

好的,您需要输入日期,路线,城市等代码。就像这样:

// do not reinitialize $temp ...and $events is not needed..

//$temp = array(); remove this line

if($date = $h->find('td', 1)) {
    if( strstr(trim($h->find('td', 0)->plaintext), 'date') ){
        $temp['date'] = $h->find('td', 1)->plaintext; //Inner contents of first cell
    }else if( strstr(trim($h->find('td', 0)->plaintext), 'DIRECTIONS') ){
        $temp['directions'] = $h->find('td', 2)->plaintext;
    }else if( strstr(trim($h->find('td', 0)->plaintext), 'CITY') ){
        $temp['city'] = $h->find('td', 2)->plaintext;
    }
    // for other fields.......
    else if( strstr(trim($h->find('td', 0)->plaintext), 'CONTACT') ){
        $temp['contact'] = $h->find('td', 2)->plaintext;
    }

}

现在你可以获得$ temp。

print_r($temp);