我试图提取第一次出现的链接,就像这样开始
https://encrypted-tbn3.gstatic.com/images?...
来自页面的源代码。链接以"开头和结尾。这是我到目前为止所得到的:
$search_query = $array[0]['Name'];
$search_query = urlencode($search_query);
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible')));
$response = file_get_contents( "https://www.google.com/search?q=$search_query&tbm=isch", false, $context);
$html = str_get_html($response);
$url = explode('"',strstr($html, 'https://encrypted-tbn3.gstatic.com/images?'[0]))
然而,$ url的输出不是我尝试提取的链接,而是一些非常不同的链接。我添加了一张图片。
任何人都可以向我解释输出以及如何获得所需的链接吗?感谢
答案 0 :(得分:1)
您似乎正在使用PHP Simple HTML DOM Parser
我通常使用DOMDocument
,它是php
内置类的一部分
以下是您需要的实例:
$search_query = $array[0]['Name'];
$search_query = urlencode($search_query);
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible')));
$response = file_get_contents( "https://www.google.com/search?q=$search_query&tbm=isch", false, $context);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($response);
foreach ($dom->getElementsByTagName('img') as $item) {
$img_src = $item->getAttribute('src');
if (strpos($img_src, 'https://encrypted') !== false) {
print $img_src."\n";
}
}
输出:
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSumjp6e37O_86nc36mlktuWpbFuCI4nkkkocoBCYW3qCOicqdu_KEK-MY
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcR_ttK8NlBgui_JndBj349UxZx0kHn0Z-Essswci-_5UQCmUOruY1PNl3M
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSydaTpSDw2mvU2JRBGEYUOstTUl4R1VhRevv1Sdinf0fxRvU26l3pTuqo
...
答案 1 :(得分:0)
$url_beginning = 'https://encrypted-tbn3.gstatic.com/images?';
if(preg_match('/\"(https\:\/\/encrypted\-tbn3\.gstatic\.com\/images\?.+?)\"/ui',$html, $matches))
$url = $matches[1];
else
$url = '';
尝试使用preg_replace,它更适合解析
在此示例中,假设您的HTML中的网址应该被引用。
<强> UPD 强> 一点点调整版本可用于任何url-beginning:
$url_beginning = 'https://encrypted-tbn3.gstatic.com/images?';
$url_beginning = preg_replace('/([^а-яА-Яa-zA-Z0-9_@%\s])/ui', '\\\\$1', $url_beginning);
if(preg_match('/\"('.$url_beginning.'.+?)\"/ui',$html, $matches))
$url = $matches[1];
else
$url = '';