如何从url到JSON获取图像

时间:2014-05-07 15:17:27

标签: php regex json

我想从网址(www.xxxxx.co.uk/bar.html)获取所有图片并将它们放到JSON

{"images":http://www.xxxxx.co.uk/foo.jpg}

这就是我的尝试:

<?php 
$html = file_get_contents('www.xxxxx.co.uk/bar.html'); 

function linkExtractor($html){ 
$linkArray = array(); 
if(preg_match_all('/<img\s+.*?src=[\"\']?([^\"\' >]*)[\"\']?[^>]*>/i',$html,$matches,PREG_SET_ORDER)){ 
foreach($matches as $match){ 
$arr = array('images' => $match); 
} 
} 
echo json_encode($arr); 
} 

echo json_encode($arr); 
?>

修改

所以我尝试了这个:

$page = file_get_contents('www.xxxxx.co.uk/bar.html');
$doc = new DOMDocument(); 
$doc->loadHTML($page);
$images = $doc->getElementsByTagName('img'); 
foreach($images as $image) {
    $src = $image->getAttribute('src');
    $arr = array('images' => $src );
    echo json_encode($arr);
}

我收到了这些错误:

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3188 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3207 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3226 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3245 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : iframe in Entity, line: 3287 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : iframe in Entity, line: 3330 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3351 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3370 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3389 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3389 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3408 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 3408 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3427 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3446 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3466 in /home/content/57/9770557/html/untitled folder/json.php on line 5

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3485 in /home/content/57/9770557/html/untitled folder/json.php on line 5
{"images":"loader.gif"}{"images":"logo.png"}{"images":"facebook.png"}{"images":"Yotube.png"}{"images":"twitter.png"}{"images":"Soundcloud.png"}{"images":"1355334348_br_down.png"}{"images":"video images\/ONYX sofa.jpg"}{"images":"video images\/aaron duran.jpg"}{"images":"video images\/littledragon.jpg"}{"images":"video images\/cantalivering house.jpg"}{"images":"video images\/Chef.jpg"}{"images":"video images\/monument valley.jpg"}{"images":"video images\/set a drift t shirts.jpg"}{"images":"video images\/Leica camera.jpg"}{"images":"video images\/Bubbledogs restuarant.jpg"}{"images":"video images\/Architectural density.jpg"}{"images":"video images\/Seven Automatic Landscapes.jpg"}{"images":"video images\/alphabet.jpg"}{"images":"video images\/offices in the forest.jpg"}{"images":"video images\/Environmental Street Art by ROA.jpg"}{"images":"video images\/ Camille Seaman.jpg"}{"images":"video images\/Klaus Pitchler.jpg"}{"images":"video images\/Lowdi.jpg"}{"images":"video images\/Mary OMalley.jpg"}{"images":"video images\/Patricia Piccinini.jpg"}{"images":"video images\/Santa Cruz.jpg"}{"images":"video images\/Sonia Rentsch.jpg"}{"images":"video images\/Studio Natural.jpg"}{"images":"video images\/The Tea Calender.jpg"}{"images":"video images\/Watch Dogs.jpg"}{"images":"video images\/wes21.jpg"}{"images":"video images\/Act Romegialli Architects.jpg"}{"images":"video images\/Romain Jacquet-Lagreze.jpg"}{"images":"video images\/Nicholas Hance McElroy.jpg"}{"images":"video images\/Insa.gif"}{"images":"video images\/Tsatsas bag.jpg"}{"images":"video images\/st pancras.jpg"}{"images":"video images\/anthillfilms.jpg"}{"images":"video images\/mt wolf.jpg"}{"images":"video images\/die.jpg"}{"images":"video images\/jazz that nobody asked for.jpg"}{"images":"video images\/oscilate.jpg"}{"images":"video images\/ghostpoet.jpg"}{"images":"video images\/oak hanger.jpg"}{"images":"video images\/iceball.jpg"}{"images":"video images\/fabian oefner.jpg"}{"images":"video images\/yago portal.jpg"}{"images":"video images\/illustrations on bike wheels.jpg"}{"images":"video images\/symmetrees.jpg"}{"images":"video images\/undercity.jpg"}{"images":"video images\/IFHY.jpg"}{"images":"video images\/the abc of architects.jpg"}{"images":"video images\/chum.jpg"}{"images":"video images\/crankworx.jpg"}{"images":"video images\/romare.jpg"}{"images":"video images\/White noise.jpg"}{"images":"video images\/silvestre architects.jpg"}{"images":"video images\/airport.jpg"}{"images":"video images\/feather.jpg"}{"images":"video images\/Nico Van Der Meulen.jpg"}{"images":"video images\/51m trampoline.jpg"}{"images":"video images\/lets talk about soil.jpg"}{"images":"video images\/alberto seveso.jpg"}{"images":"video images\/ibike.jpg"}{"images":"video images\/robs wood grain bike.jpg"}{"images":"video images\/smokehouse.jpg"}{"images":"video images\/laurent chehere.jpg"}{"images":"video images\/SOHN.jpg"}{"images":"video images\/the employment.jpg"}{"images":"video images\/little printer.jpg"}{"images":"video images\/procrastination.jpg"}{"images":"video images\/touchwood commercial.jpg"}{"images":"video images\/fusefones.jpg"}{"images":"video images\/allandale house.jpg"}{"images":"video images\/Spherikal.jpg"}{"images":"video images\/power.jpg"}{"images":"video images\/reykjavik house.jpg"}{"images":"video images\/click&grow.jpg"}{"images":"video images\/sfelt table.jpg"}{"images":"video images\/gopro.jpg"}
  1. 为什么链接有/不仅仅是??
  2. 为什么要做多个“{”图片“:”video images / gopro.jpg“}”
  3. Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3107 in /home/content/57/9770557/html/untitled folder/json.php on line 5错误是什么?

2 个答案:

答案 0 :(得分:1)

你几乎没错,在你的网址前面有一些像http://这样的东西丢失了,并从你的函数中返回一个值而不是回显它。 试试这个:

$html = file_get_contents('http://www.setours.com');

function linkExtractor($html){
    $imageArr = array();
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $images = $doc->getElementsByTagName('img');
    foreach($images as $image) {
        array_push($imageArr, $image->getAttribute('src'));
    }
    return $imageArr;
}

echo json_encode(array("images" => linkExtractor($html)));

使用loadHTML函数的@ infront来抑制未知HTML元素的警告。

答案 1 :(得分:0)

使用DOM解析器从HTML文档中提取信息:

function extractImgages($url) {

    // Prepare result
    $result = array('images' => array());

    // Create a document object out of the HTML
    $doc = new DOMDocument();
    if(!@$doc->loadHTML($url)) {
        throw new Exception('Bad HTML');
    }

    // Iterate through '<img>' elements and store urls
    foreach($doc->getElementsByTagName('img') as $img) {
        $result['images'][]= $img->getAttribute('src');
    }

    return json_encode($result);
}

我在解析HTML时使用静默运算符@,因为当HTML来自不受信任的来源时,如果HTML源无效,则会产生警告。 @会抑制它们。