我有从HTML网页上抓取图像的功能 这是我要抓的HTML源代码
<div class="single-post-thumb">
<img width="448" height="298" src="http://www.website.com/wp-content/uploads/2015/02/DSC_2803.jpg" class="attachment-660x330" alt="Description image" title="Description title" /> </div>
这是我的刮刮功能
public function process_individual_links($news_coll)
{
echo "Fetching Content - " . $news["news_url"]."". $news["news_images"] . "";
$news_coll = array_reverse($news_coll);
//print_r($news_coll);
foreach($news_coll as $news)
{
$news_url = $news["news_url"];
$preview = $this->_http->request($news_url);
$preview = $this->stripNewLine($preview);
$expr = '#<div class="single-post-thumb"><img .*? src="(.*?)".*?/></div>.*?<div class="entry">(.*?)</div>#';
preg_match_all($expr, $preview, $matches);
$count = count($matches[0]) ;
if($count == 0)
{
$expr = '#<div class="entry">(.*?)</div><!-- .entry /-->#';
$news["news_images"] = str_replace('"', "", $match[1][0]);
preg_match_all($expr, $preview, $matches);
$news["news_content"] = $matches[1][0];
}
else
{
$news["news_images"] = str_replace('"', "", $match[1][0]);
$news["news_content"] = $matches[2][0];
echo" $news[news_images] ";
}
$imager = str_replace('"', "", $match[1][0]);
$news["news_content"] = $news["news_content"] . "<p><a href='" . $news_url . "'>Sumber Berita</a></p>".$imager;
if($this->insertIntoWordpress($news, "TNI") == "-1")
echo " ";
else
echo "Fetching Content - " . $news["news_url"]."". $news["news_images"] . "";
}
}
我在其他网站上尝试像<img src="">
这样的工作,没有src
我将此表达式称为刮取代码
$expr = '#<div class="single-post-thumb"><img .*? src="(.*?)".*?/></div>.*?<div class="entry">(.*?)</div>#';