Question

将图像作为数组存在以下问题。在这段代码中，我试图检查是否存在搜索图像Test 1-如果是，则显示，如果不显示，则尝试使用Test 2，仅此而已。当前的代码可以做到，但是非常慢。

此if (sizeof($matches[1]) > 3) {是因为此3有时在抓取的网站上包含广告，因此这是我安全的跳过方法。

我的问题是我如何才能加快下面的代码的速度，以使if (sizeof($matches[1]) > 3) {更快？我相信这会使代码非常慢，因为此数组可能包含多达1000张图像

$get_search = 'Test 1';

$html = file_get_contents('https://www.everypixel.com/search?q='.$get_search.'&is_id=1&st=free');
preg_match_all('|<img.*?src=[\'"](.*?)[\'"].*?>|i', $html, $matches);

if (sizeof($matches[1]) > 3) {
  $ch_foreach = 1;
}

if ($ch_foreach == 0) {

    $get_search = 'Test 2';

  $html = file_get_contents('https://www.everypixel.com/search?q='.$get_search.'&is_id=1&st=free');
  preg_match_all('|<img.*?src=[\'"](.*?)[\'"].*?>|i', $html, $matches);

  if (sizeof($matches[1]) > 3) {
     $ch_foreach = 1;
  }

}

foreach ($matches[1] as $match) if ($tmp++ < 20) {

  if (@getimagesize($match)) {

    // display image
    echo $match;

  }

}

Answer 1

$html = file_get_contents('https://www.everypixel.com/search?q='.$get_search.'&is_id=1&st=free');

除非www.everypixel.com服务器位于同一LAN上（在这种情况下，压缩开销可能比纯文本传输慢），否则使用CURLOPT_ENCODING进行curl的速度应比file_get_contents更快，即使它位于同样，curl应该比file_get_contents更快，因为file_get_contents一直读取直到服务器关闭连接，但是curl一直读取直到读取Content-Length个字节，这比等待服务器关闭套接字要快，所以相反：

$ch=curl_init('https://www.everypixel.com/search?q='.$get_search.'&is_id=1&st=free');
curl_setopt_array($ch,array(CURLOPT_ENCODING=>'',CURLOPT_RETURNTRANSFER=>1));
$html=curl_exec($ch);

关于您的正则表达式：

preg_match_all('|<img.*?src=[\'"](.*?)[\'"].*?>|i', $html, $matches);

具有getElementsByTagName（“ img”）和getAttribute（“ src”）的DOMDocument应该比使用正则表达式更快，因此请执行以下操作：

$domd=@DOMDocument::loadHTML($html);
$urls=[];
foreach($domd->getElementsByTagName("img") as $img){
    $url=$img->getAttribute("src");
    if(!empty($url)){
        $urls[]=$url;
    }
}

，可能是整个代码中最慢的部分，循环中的@getimagesize($match)可能包含1000多个URL，每次使用URL调用getimagesize（）都会使php下载图像，并且它使用file_get_contents方法的含义它会遇到同样的Content-Length问题，这会导致file_get_contents变慢。此外，所有图像都按顺序下载，并行下载它们应该更快，这可以使用curl_multi api来完成，但这是一项复杂的任务，我cba为您编写了一个示例，但是我可以指出示例：https://stackoverflow.com/a/54717579/1067003

如何使用file_get_contents作为数组获取图像

1 个答案: