我试图从meta标签获取图像和其他数据。 您能指导我如何从特定的网址获取图片吗?
例如网址:
代码:
function getUrlData($url) {
$result = false;
$contents = getUrlContents($url);
if (isset($contents) && is_string($contents)) {
$title = null;
$metaTags = null;
preg_match('/<title>([^>]*)<\/title>/si', $contents, $match);
if (isset($match) && is_array($match) && count($match) > 0) {
$title = strip_tags($match[1]);
}
preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
if (isset($match) && is_array($match) && count($match) == 3) {
$originals = $match[0];
$names = $match[1];
$values = $match[2];
if (count($originals) == count($names) && count($names) == count($values)) {
$metaTags = array();
for ($i = 0, $limiti = count($names); $i < $limiti; $i++) {
$metaTags[$names[$i]] = array(
'html' => htmlentities($originals[$i]),
'value' => $values[$i]
);
}
}
}
$result = array(
'title' => $title,
'metaTags' => $metaTags
);
}
return $result;
}
function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0) {
$result = false;
$contents = @file_get_contents($url);
// Check if we need to go somewhere else
if (isset($contents) && is_string($contents)) {
preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1) {
if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections) {
return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
}
$result = false;
} else {
$result = $contents;
}
}
return $contents;
}
$test = getUrlData('https://www.amazon.in/Redmi-Pro-Black-32GB-Storage/dp/B07DJL15QT/ref=lp_16113280031_1_1?srs=16113280031&ie=UTF8&qid=1553411505&sr=8-1'); //Replace with your URL
这里
echo '<pre>';
print_r($test);
我无法从此URL和第一个url找到图像数据。
答案 0 :(得分:0)
使用DomDocument
和DOMXPath
解析从给定URL中检索到的html:
function outputMetaTags($url){
// $url = 'https://www.myntra.com/casual-shoes/kook-n-keech/kook-n-keech-men-white-sneakers/2154180/buy';
$streamContext = stream_context_create(array(
"http" => array(
"header" => "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36",
'follow_location' => false
)
)
); //we try to act as browser, just in case server forbids us to access to page
$htmlData = file_get_contents($url, false, $streamContext); //fetch the html data from given url
//libxml_use_internal_errors(true); //optionally disable libxml url errors and warnings
$doc = new DOMDocument(); //parse with DOMDocument
$doc->loadHTML($htmlData);
$xpath = new DOMXPath($doc); //create DOMXPath object and parse loaded DOM from HTML
$query = '//*/meta';
$metaData = $xpath->query($query);
foreach ($metaData as $singleMeta) {
//for og:image, check if $singleMeta->getAttribute('property') === 'og:image', same goes with og:url
//not every meta has property or name attribute
if(!empty($singleMeta->getAttribute('property'))){
echo $singleMeta->getAttribute('property') . "\n";
}elseif(!empty($singleMeta->getAttribute('name'))){
echo $singleMeta->getAttribute('name') . "\n";
}
//get content from meta tag
echo $singleMeta->getAttribute('content') . "\n";
}
}
详细了解DOMDocument和DOMXpath:
http://php.net/manual/en/class.domdocument.php
http://php.net/manual/en/class.domxpath.php
关于元标记:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta