Zend_Dom_Query获取数组

时间:2012-05-21 12:11:18

标签: php zend-framework dom screen-scraping zend-dom-query

我正在尝试从关联数组中的网站中删除HTML代码。 我用Zend_Dom_Query尝试过它。

示例:

<div class="job">
   <div class="jobTitle">
    <a href="http://website.com/Job-Title-1">Job-Title-1</a>
   </div>
   <div class="company">
   <a href="http://website.com/Company-1">Company-1</a>
   </div>
   <div class="city">
   <a href="http://website.com/City-1">City-1</a>
   </div>
</div>
<div class="job">
    <div class="jobTitle">
    <a href="http://website.com/Job-Title-2">Job-Title-2</a>
    </div>
    <div class="company">
       <a href="http://website.com/Company-2">Company-2</a>
   </div>
   <div class="city">
      <a href="http://website.com/City-2">City-2</a>
   </div>
</div>

我如何从html上面获得关联数组?

 $dom = new Zend_Dom_Query($html);
 $links = $dom->query('div.jobTitle a');
 $companies = $dom->query('div.company');
 $cities = $dom->query('div.city');

 //result needed
 $result_array = array( array( link => 'http://website.com/Job-Title-1', 
         Company => 'Company-1', 
         City => 'City-1'
        ),
      array( link => 'http://website.com/Job-Title-2', 
         Company => 'Company-2', 
         City => 'City-2'
        )
     );

1 个答案:

答案 0 :(得分:0)

    $dom=new Zend_Dom_Query($html);
    $links=$dom->query('div.jobTitle a');
    $companies=$dom->query('div.company');
    $cities=$dom->query('div.city');

        $data=[];
    foreach ($links as $link){
        $data[]=[
           'link'=> $link->getAttribute('href'),
           'Company'=>trim($companies->current()->textContent),
           'City'=>trim($cities->current()->textContent)
           ];
        $companies->next();
        $cities->next();
    }
    var_dump($data);