我正在尝试使用php和curl从网站上抓取数据。它的源代码具有相同类的相似标签

时间:2016-07-01 07:19:26

标签: php curl

这是网站的源代码

<li>
  <div class="headingBox">Name of the Test: </div>
  <div class="detailBox">ALLERGY SCREEN</div>
</li>
<li style='display:block'>
  <div class="headingBox">Pre Test Information: </div>
  <div class="detailBox">No special preparation required.</div>
</li>
<li style='display:none'>
  <div class="headingBox">Home Collection Number:</div>
  <div class="detailBox"></div>
</li>`

当我在下面编写PHP脚本时,我只得到一个值,即Allergy Screen。如何使用class=detailbox

获取所有div标签中包含的数据

请仅提供CURL解决方案。

<?php

function curl($url) {
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, $url);    
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$data = curl_exec($ch);
 curl_close($ch);    
$i=0;
$regex = '/<div class="detailBox">(.*?)<\/div>/';
if ( preg_match($regex, $data, $list) )
{
print_r($list);
}
 }
curl("https://www.lalpathlabs.com/pathology-test/allergy-screen-total-  ige-and-phadia-top");
?>

1 个答案:

答案 0 :(得分:0)

唐;重新发明轮子,使用这个

  

http://simplehtmldom.sourceforge.net/

使用此

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url);    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$data = curl_exec($ch);

foreach($data->find('div[class=detailBox]') as $element) 
       echo $element->innertext. '<br>';

完成!