我想创建一个API,该API返回给定URL的标题和描述。
我尝试此处提供的解决方案:https://stackoverflow.com/a/3711554/5618358 目前有所改善:
不幸的是,当您传递yahoo和google URL时,它不起作用。但是可以使用其他网址,例如Github.com
我尝试逐步转储代码参数,并且我了解Yahoo返回的丑陋代码无法处理,并且Google的HTML没有描述元标记。
其他网站的相似之处
有效吗?
请帮助我解决此问题。 我为我的英语不好而道歉。
//In routes/api.php
Route::get('/links/helper/meta-tag-extractor', function(Request $request){
$url = $request->get('url');
$result = [];
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl($url);
//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
//get and display what you need:
//This part issue error for url Yahoo.com:
$result['title'] = $nodes->item(0)->nodeValue;
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description') {
$result['description'] = $meta->getAttribute('content');
}
//property="og:description"
//<meta property="og:description"
// content="Sean Connery found fame and fortune as the
// suave, sophisticated British agent, James Bond." />
if($meta->getAttribute('property') == 'og:description') {
$result['og:description'] = $meta->getAttribute('property');
}
}
// We haven't 'description' or 'og:description' in result for url: Google.com
// But for url Github.com works like a charm with result:
// {
// "title": "The world’s leading software development platform · GitHub",
// "description": "GitHub brings together the world’s largest community of developers to discover, share, and build better software. From open source projects to private team repositories, we’re your all-in-one platform for collaborative development.",
// "og:description": "og:description"
// }
return $result;
});
答案 0 :(得分:-1)
经过大量的努力,我解决了部分问题。
我解决了Yahoo的问题,所以我可以获取它的信息。
但是对于Google的URL不起作用。
当我通过服务器获取Google时,其来源中没有描述或og:description。
Yahoo的代码结果是:
{
"title": "Yahoo",
"description": "News, email and search are just the beginning. Discover more every day. Find your yodel.",
"og:title": "Yahoo",
"og:type": "website",
"og:url": "http://www.yahoo.com",
"og:description": "News, email and search are just the beginning. Discover more
every day. Find your yodel.",
"og:image": "https://s.yimg.com/dh/ap/default/130909/y_200_a.png",
"og:site_name": "Yahoo"
}
但是Google的代码结果是:
{
"title": "Google"
}
请帮助我了解有关Google的信息。...
新的源代码为:
//In routes/api.php
Route::get('/links/helper/meta-tag-extractor', function(Request $request){
$url = $request->get('url');
$result = [];
function file_get_contents_curl($url)
{
$ch = curl_init();
$timeout = 10;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT_MS, 3000);
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
// echo $html;
$nodes = $doc->getElementsByTagName('title');
if ($nodes->count()) {
$result['title'] = $nodes->item(0)->nodeValue;
}
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description') {
$result['description'] = $meta->getAttribute('content');
}
if(substr( $meta->getAttribute('property'), 0, 3 ) === 'og:') {
$result[$meta->getAttribute('property')] = $meta->getAttribute('content');
}
}
return $result;
});