我想从网址(www.xxxxx.co.uk/bar.html)获取所有图片并将它们放到JSON
像
{"images":http://www.xxxxx.co.uk/foo.jpg}
这就是我的尝试:
<?php
$html = file_get_contents('www.xxxxx.co.uk/bar.html');
function linkExtractor($html){
$linkArray = array();
if(preg_match_all('/<img\s+.*?src=[\"\']?([^\"\' >]*)[\"\']?[^>]*>/i',$html,$matches,PREG_SET_ORDER)){
foreach($matches as $match){
$arr = array('images' => $match);
}
}
echo json_encode($arr);
}
echo json_encode($arr);
?>
修改
所以我尝试了这个:
$page = file_get_contents('www.xxxxx.co.uk/bar.html');
$doc = new DOMDocument();
$doc->loadHTML($page);
$images = $doc->getElementsByTagName('img');
foreach($images as $image) {
$src = $image->getAttribute('src');
$arr = array('images' => $src );
echo json_encode($arr);
}
我收到了这些错误:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3188 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3207 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3226 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3245 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : iframe in Entity, line: 3287 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : iframe in Entity, line: 3330 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3351 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3370 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3389 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3389 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3408 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 3408 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3427 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3446 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3466 in /home/content/57/9770557/html/untitled folder/json.php on line 5
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3485 in /home/content/57/9770557/html/untitled folder/json.php on line 5
{"images":"loader.gif"}{"images":"logo.png"}{"images":"facebook.png"}{"images":"Yotube.png"}{"images":"twitter.png"}{"images":"Soundcloud.png"}{"images":"1355334348_br_down.png"}{"images":"video images\/ONYX sofa.jpg"}{"images":"video images\/aaron duran.jpg"}{"images":"video images\/littledragon.jpg"}{"images":"video images\/cantalivering house.jpg"}{"images":"video images\/Chef.jpg"}{"images":"video images\/monument valley.jpg"}{"images":"video images\/set a drift t shirts.jpg"}{"images":"video images\/Leica camera.jpg"}{"images":"video images\/Bubbledogs restuarant.jpg"}{"images":"video images\/Architectural density.jpg"}{"images":"video images\/Seven Automatic Landscapes.jpg"}{"images":"video images\/alphabet.jpg"}{"images":"video images\/offices in the forest.jpg"}{"images":"video images\/Environmental Street Art by ROA.jpg"}{"images":"video images\/ Camille Seaman.jpg"}{"images":"video images\/Klaus Pitchler.jpg"}{"images":"video images\/Lowdi.jpg"}{"images":"video images\/Mary OMalley.jpg"}{"images":"video images\/Patricia Piccinini.jpg"}{"images":"video images\/Santa Cruz.jpg"}{"images":"video images\/Sonia Rentsch.jpg"}{"images":"video images\/Studio Natural.jpg"}{"images":"video images\/The Tea Calender.jpg"}{"images":"video images\/Watch Dogs.jpg"}{"images":"video images\/wes21.jpg"}{"images":"video images\/Act Romegialli Architects.jpg"}{"images":"video images\/Romain Jacquet-Lagreze.jpg"}{"images":"video images\/Nicholas Hance McElroy.jpg"}{"images":"video images\/Insa.gif"}{"images":"video images\/Tsatsas bag.jpg"}{"images":"video images\/st pancras.jpg"}{"images":"video images\/anthillfilms.jpg"}{"images":"video images\/mt wolf.jpg"}{"images":"video images\/die.jpg"}{"images":"video images\/jazz that nobody asked for.jpg"}{"images":"video images\/oscilate.jpg"}{"images":"video images\/ghostpoet.jpg"}{"images":"video images\/oak hanger.jpg"}{"images":"video images\/iceball.jpg"}{"images":"video images\/fabian oefner.jpg"}{"images":"video images\/yago portal.jpg"}{"images":"video images\/illustrations on bike wheels.jpg"}{"images":"video images\/symmetrees.jpg"}{"images":"video images\/undercity.jpg"}{"images":"video images\/IFHY.jpg"}{"images":"video images\/the abc of architects.jpg"}{"images":"video images\/chum.jpg"}{"images":"video images\/crankworx.jpg"}{"images":"video images\/romare.jpg"}{"images":"video images\/White noise.jpg"}{"images":"video images\/silvestre architects.jpg"}{"images":"video images\/airport.jpg"}{"images":"video images\/feather.jpg"}{"images":"video images\/Nico Van Der Meulen.jpg"}{"images":"video images\/51m trampoline.jpg"}{"images":"video images\/lets talk about soil.jpg"}{"images":"video images\/alberto seveso.jpg"}{"images":"video images\/ibike.jpg"}{"images":"video images\/robs wood grain bike.jpg"}{"images":"video images\/smokehouse.jpg"}{"images":"video images\/laurent chehere.jpg"}{"images":"video images\/SOHN.jpg"}{"images":"video images\/the employment.jpg"}{"images":"video images\/little printer.jpg"}{"images":"video images\/procrastination.jpg"}{"images":"video images\/touchwood commercial.jpg"}{"images":"video images\/fusefones.jpg"}{"images":"video images\/allandale house.jpg"}{"images":"video images\/Spherikal.jpg"}{"images":"video images\/power.jpg"}{"images":"video images\/reykjavik house.jpg"}{"images":"video images\/click&grow.jpg"}{"images":"video images\/sfelt table.jpg"}{"images":"video images\/gopro.jpg"}
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3107 in /home/content/57/9770557/html/untitled folder/json.php on line 5
错误是什么?答案 0 :(得分:1)
你几乎没错,在你的网址前面有一些像http://
这样的东西丢失了,并从你的函数中返回一个值而不是回显它。
试试这个:
$html = file_get_contents('http://www.setours.com');
function linkExtractor($html){
$imageArr = array();
$doc = new DOMDocument();
@$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
foreach($images as $image) {
array_push($imageArr, $image->getAttribute('src'));
}
return $imageArr;
}
echo json_encode(array("images" => linkExtractor($html)));
使用loadHTML
函数的@ infront来抑制未知HTML元素的警告。
答案 1 :(得分:0)
使用DOM解析器从HTML文档中提取信息:
function extractImgages($url) {
// Prepare result
$result = array('images' => array());
// Create a document object out of the HTML
$doc = new DOMDocument();
if(!@$doc->loadHTML($url)) {
throw new Exception('Bad HTML');
}
// Iterate through '<img>' elements and store urls
foreach($doc->getElementsByTagName('img') as $img) {
$result['images'][]= $img->getAttribute('src');
}
return json_encode($result);
}
我在解析HTML时使用静默运算符@
,因为当HTML来自不受信任的来源时,如果HTML源无效,则会产生警告。 @
会抑制它们。