从远程网页检索元数据时,我遇到了这两个错误。这是一个逃避问题还是cURL问题?
Warning: get_meta_tags(<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://...@import url( "http://www.zymic.com/forum/style_images/v6/folder_editor_images/css_rte.css" ); </style> </head> <body> <div id="ipbwrapper"> <!--ipb.javascript.start--> <script type="text/javascript"> //<![CDATA[ var ipb_var_st = "0"; var ipb_lang_tpl_q1 = "Please enter a page number to jump to between 1 and"; var ipb_var_s = "f2e0d2b492f248ec27ef34ae291a1db4"; var ipb_var_phpext = "php"; var ipb_var_base_url = "http://www.zymic.com/forum/index.php?s=f2e0d2b492f248ec27ef34ae291a1db4&"; var ipb_var_image_url = "style_images/v6"; var ipb_input_f = "34"; var ipb_input_t = "5188"; var ipb_input_p = ""; var ipb_var_cookieid
= ""; var ipb_var_cookie_ in public_html/list/main/output.php on line 22 retrieve pagetitle Warning: file_get_contents(<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://...@import url( "http://www.zymic.com/forum/style_images/v6/folder_editor_images/css_rte.css" ); </style> </head> <body> <div id="ipbwrapper"> <!--ipb.javascript.start--> <script type="text/javascript"> //<![CDATA[ var ipb_var_st = "0"; var ipb_lang_tpl_q1 = "Please enter a page number to jump to between 1 and"; var ipb_var_s = "f2e0d2b492f248ec27ef34ae291a1db4"; var ipb_var_phpext = "php"; var ipb_var_base_url = "http://www.zymic.com/forum/index.php?s=f2e0d2b492f248ec27ef34ae291a1db4&"; var ipb_var_image_url = "style_images/v6"; var ipb_input_f = "34"; var ipb_input_t = "5188"; var ipb_input_p = ""; var ipb_var_cookieid
= ""; var ipb_var_coo in /public_html/list/main/output.php on line 27
以下是代码:
////Use Curl Library to get page content for security
$url = 'http://en.wikipedia.org/wiki/Category:Lists_of_lists';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_USERAGENT, 'ListBot 1.0: Used for compiling a DB of lists across the internet.');
$str = curl_exec($curl);
curl_close($curl);
//get metadata
$tags = get_meta_tags($str);
//Get page title
function get_page_title($str){
if( !($data = file_get_contents($str)) ) return false;
if( preg_match("#<title>(.+)<\/title>#iU", $data, $t)) {
return trim($t[1]);
} else {
return false;
}
}
///////////
echo('retrieve pagetitle');
$tags['title'] = get_page_title($str);
答案 0 :(得分:1)
get_meta_tags需要一个文件位置(通常是一个网址)。
您可以直接请求url并解析标题,但是对于使用curl检索的字符串执行正则表达式匹配,您可能会得到更好的结果。
你有一些很好的代码来抓住标题。只需修改它即可获取所有元标记。
在php.net page describing "get_meta_tags()" jstel at 126 dot com贡献了这个不错的函数调用:
preg_match_all(“/&lt; meta [^&gt;] +(http-equiv | name)= \”([^ \“] )\”[^&gt;]“。”+ content = \ “([^ \”] )\“[^&gt;] *&gt; / i”,$ v,$ split_content [],PREG_PATTERN_ORDER);
将在字符串$ v中搜索元数据并将匹配转储到$ split_content。在他的示例中,他做了一堆似乎不需要的循环,但我建议查看他的代码,看看你是否可以适应它。