如何使用PHP获取网站的favicon?

时间:2011-04-18 10:32:42

标签: php regex favicon

我想通过PHP请求网站的favicon。我被推荐使用谷歌的favicon服务,但它不起作用。我想自己做点什么,但不知道正则表达式的用法。

我在Google上发现了一个适用于大多数情况的课程,但它的错误率却令人无法接受。您可以在这里查看:http://www.controlstyle.com/articles/programming/text/php-favicon/

有人可以帮我解决使用正则表达式获取图标的问题吗?

13 个答案:

答案 0 :(得分:42)

使用谷歌提供的S2 service。它就像这个一样简单

http://www.google.com/s2/favicons?domain=www.yourdomain.com

刮痧会更容易,试图自己动手。

答案 1 :(得分:29)

又快又脏:

<?php 
$url = 'http://example.com/';
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[@rel="shortcut icon"]');
echo $arr[0]['href'];

答案 2 :(得分:4)

看起来http://www.getfavicon.org/?url=domain.comFAQ)可靠地抓取网站的图标。我意识到这是第三方服务,但我认为它是Google favicon服务的一个有价值的替代品。

答案 3 :(得分:3)

根据Wikipedia,网站可以使用两种主要方法来浏览浏览器。第一个是史蒂夫提到的,在web服务器的根目录中将图标存储为favicon.ico。第二种是通过HTML链接标记引用favicon。

要涵盖所有这些情况,最好的方法是首先测试favicon.ico文件是否存在,如果不存在,请搜索<link rel="icon"<link rel="shortcut icon"部分在源(限于HTML头节点),直到找到favicon。您是否选择使用正则表达式或某些other string search option(更不用说内置的PHP版本)取决于您。最后,this question可能对您有所帮助。

答案 4 :(得分:3)

我一直在做类似的事情,我用一堆URL检查了这一切,似乎一切正常。 网址不一定是基本网址

function getFavicon($url){
    # make the URL simpler
    $elems = parse_url($url);
    $url = $elems['scheme'].'://'.$elems['host'];

    # load site
    $output = file_get_contents($url);

    # look for the shortcut icon inside the loaded page
    $regex_pattern = "/rel=\"shortcut icon\" (?:href=[\'\"]([^\'\"]+)[\'\"])?/";
    preg_match_all($regex_pattern, $output, $matches);

    if(isset($matches[1][0])){
        $favicon = $matches[1][0];

        # check if absolute url or relative path
        $favicon_elems = parse_url($favicon);

        # if relative
        if(!isset($favicon_elems['host'])){
            $favicon = $url . '/' . $favicon;
        }

        return $favicon;
    }

    return false;
}

答案 5 :(得分:2)

我已经实现了我自己的favicon抓取器,我在此处详细介绍了另一个StackOverflow帖子中的用法:Get website's favicon with JS

谢谢,让我知道它是否对您有所帮助。此外,非常感谢任何反馈。

答案 6 :(得分:2)

第一种方法,我们可以从favicon.ico搜索它,如果找到它然后它会显示它不是

<?php
        $userPath=$_POST["url"];
        $path="http://www.".$userPath."/favicon.ico";
        $header=  get_headers($path);
        if(preg_match("|200|", $header[0]))
        {
            echo '<img src="'.$path.'">';
        }
        else
        {
            echo "<span class=error>Not found</span>";
        }
    ?>

在其他方法中,您可以搜索图标并获取该图标文件

    <?php
$website=$_POST["url"];
$fevicon= getFavicon($website);
echo '<img src="http://www.'.$website.'/'.$fevicon.'">';
function getFavicon($site)
{
            $html=file_get_contents("http://www.".$site);
            $dom=new DOMDocument();
            @$dom->loadHTML($html);
            $links=$dom->getElementsByTagName('link');
            $fevicon='';

            for($i=0;$i < $links->length;$i++ )
            {
                $link=$links->item($i);
                if($link->getAttribute('rel')=='icon'||$link->getAttribute('rel')=="Shortcut Icon"||$link->getAttribute('rel')=="shortcut icon")
                {
                    $fevicon=$link->getAttribute('href');
                }
            }
            return  $fevicon;
}
?>

答案 7 :(得分:1)

请参阅此答案:https://stackoverflow.com/a/22771267。这是一个易于使用的PHP类来获取favicon URL并下载它,它还提供了一些关于favicon文件类型或者如何找到favicon的信息(默认URL,<link>标签...):

<?php
require 'FaviconDownloader.class.php';
$favicon = new FaviconDownloader('https://code.google.com/p/chromium/issues/detail?id=236848');

if($favicon->icoExists){
    echo "Favicon found : ".$favicon->icoUrl."\n";

    // Saving favicon to file
    $filename = 'favicon-'.time().'.'.$favicon->icoType;
    file_put_contents($filename, $favicon->icoData);
    echo "Saved to ".$filename."\n\n";
} else {
    echo "No favicon for ".$favicon->url."\n\n";
}

$favicon->debug();
/*
FaviconDownloader Object
(
    [url] => https://code.google.com/p/chromium/issues/detail?id=236848
    [pageUrl] => https://code.google.com/p/chromium/issues/detail?id=236848
    [siteUrl] => https://code.google.com/
    [icoUrl] => https://ssl.gstatic.com/codesite/ph/images/phosting.ico
    [icoType] => ico
    [findMethod] => head absolue_full
    [error] => 
    [icoExists] => 1
    [icoMd5] => a6cd47e00e3acbddd2e8a760dfe64cdc
)
*/
?>

答案 8 :(得分:1)

PHP Grab Favicon

这是一种很舒适的方法,具有许多参数,可从页面URL获取图标。

工作原理

  1. 检查收藏夹图标是否已存在于本地或是否希望保存,如果是,则返回路径和文件名
  2. 否则加载URL并尝试将图标图标的位置与正则表达式匹配
  3. 如果我们有一个匹配项,则将使favicon链接变为绝对
  4. 如果我们没有站点图标,则尝试在域根目录中获取一个
  5. 如果仍然没有图标,我们会随机尝试使用google,faviconkit和favicongrabber API
  6. 如果应该保存收藏夹图标,请尝试加载收藏夹网址
  7. 如果希望下次保存Favicon并返回路径和文件名

因此,它结合了两种方式:尝试从页面获取Favicon,如果不起作用,请使用“ API”服务,该服务会退还Favicon;-)

<?php
/*

PHP Grab Favicon
================

> This `PHP Favicon Grabber` use a given url, save a copy (if wished) and return the image path.

How it Works
------------

1. Check if the favicon already exists local or no save is wished, if so return path & filename
2. Else load URL and try to match the favicon location with regex
3. If we have a match the favicon link will be made absolute
4. If we have no favicon we try to get one in domain root
5. If there is still no favicon we randomly try google, faviconkit & favicongrabber API
6. If favicon should be saved try to load the favicon URL
7. If wished save the Favicon for the next time and return the path & filename

How to Use
----------

```PHP
$url = 'example.com';

$grap_favicon = array(
'URL' => $url,   // URL of the Page we like to get the Favicon from
'SAVE'=> true,   // Save Favicon copy local (true) or return only favicon url (false)
'DIR' => './',   // Local Dir the copy of the Favicon should be saved
'TRY' => true,   // Try to get the Favicon frome the page (true) or only use the APIs (false)
'DEV' => null,   // Give all Debug-Messages ('debug') or only make the work (null)
);

echo '<img src="'.grap_favicon($grap_favicon).'">';
```

Todo
----
Optional split the download dir into several sub-dirs (MD5 segment of filename e.g. /af/cd/example.com.png) if there are a lot of favicons.

Infos about Favicon
-------------------
https://github.com/audreyr/favicon-cheat-sheet

###### Copyright 2019 Igor Gaffling

*/ 

$testURLs = array(
  'http://aws.amazon.com',
  'http://www.apple.com',
  'http://www.dribbble.com',
  'http://www.github.com',
  'http://www.intercom.com',
  'http://www.indiehackers.com',
  'http://www.medium.com',
  'http://www.mailchimp.com',
  'http://www.netflix.com',
  'http://www.producthunt.com',
  'http://www.reddit.com',
  'http://www.slack.com',
  'http://www.soundcloud.com',
  'http://www.stackoverflow.com',
  'http://www.techcrunch.com',
  'http://www.trello.com',
  'http://www.vimeo.com',
  'https://www.whatsapp.com/',
  'https://www.gaffling.com/',
);

foreach ($testURLs as $url) {
  $grap_favicon = array(
    'URL' => $url,   // URL of the Page we like to get the Favicon from
    'SAVE'=> true,   // Save Favicon copy local (true) or return only favicon url (false)
    'DIR' => './',   // Local Dir the copy of the Favicon should be saved
    'TRY' => true,   // Try to get the Favicon frome the page (true) or only use the APIs (false)
    'DEV' => null,   // Give all Debug-Messages ('debug') or only make the work (null)
  );
  $favicons[] = grap_favicon($grap_favicon);
}
foreach ($favicons as $favicon) {
  echo '<img title="'.$favicon.'" style="width:32px;padding-right:32px;" src="'.$favicon.'">';
}
echo '<br><br><tt>Runtime: '.round((microtime(true)-$_SERVER["REQUEST_TIME_FLOAT"]),2).' Sec.';

function grap_favicon( $options=array() ) {

  // Ini Vars
  $url       = (isset($options['URL']))?$options['URL']:'gaffling.com';
  $save      = (isset($options['SAVE']))?$options['SAVE']:true;
  $directory = (isset($options['DIR']))?$options['DIR']:'./';
  $trySelf   = (isset($options['TRY']))?$options['TRY']:true;
  $DEBUG     = (isset($options['DEV']))?$options['DEV']:null;

  // URL to lower case
    $url = strtolower($url);

    // Get the Domain from the URL
  $domain = parse_url($url, PHP_URL_HOST);

  // Check Domain
  $domainParts = explode('.', $domain);
  if(count($domainParts) == 3 and $domainParts[0]!='www') {
    // With Subdomain (if not www)
    $domain = $domainParts[0].'.'.
              $domainParts[count($domainParts)-2].'.'.$domainParts[count($domainParts)-1];
  } else if (count($domainParts) >= 2) {
    // Without Subdomain
        $domain = $domainParts[count($domainParts)-2].'.'.$domainParts[count($domainParts)-1];
    } else {
      // Without http(s)
      $domain = $url;
    }

    // FOR DEBUG ONLY
    if($DEBUG=='debug')print('<b style="color:red;">Domain</b> #'.@$domain.'#<br>');

    // Make Path & Filename
    $filePath = preg_replace('#\/\/#', '/', $directory.'/'.$domain.'.png');

    // If Favicon not already exists local
  if ( !file_exists($filePath) or @filesize($filePath)==0 ) {

    // If $trySelf == TRUE ONLY USE APIs
    if ( isset($trySelf) and $trySelf == TRUE ) {  

      // Load Page
      $html = load($url, $DEBUG);

      // Find Favicon with RegEx
      $regExPattern = '/((<link[^>]+rel=.(icon|shortcut icon|alternate icon)[^>]+>))/i';
      if ( @preg_match($regExPattern, $html, $matchTag) ) {
        $regExPattern = '/href=(\'|\")(.*?)\1/i';
        if ( isset($matchTag[1]) and @preg_match($regExPattern, $matchTag[1], $matchUrl)) {
          if ( isset($matchUrl[2]) ) {

            // Build Favicon Link
            $favicon = rel2abs(trim($matchUrl[2]), 'http://'.$domain.'/');

            // FOR DEBUG ONLY
            if($DEBUG=='debug')print('<b style="color:red;">Match</b> #'.@$favicon.'#<br>');

          }
        }
      }

      // If there is no Match: Try if there is a Favicon in the Root of the Domain
        if ( empty($favicon) ) { 
        $favicon = 'http://'.$domain.'/favicon.ico';

        // Try to Load Favicon
        if ( !@getimagesize($favicon) ) {
          unset($favicon);
        }
        }

    } // END If $trySelf == TRUE ONLY USE APIs

    // If nothink works: Get the Favicon from API
    if ( !isset($favicon) or empty($favicon) ) {

      // Select API by Random
      $random = rand(1,3);

      // Faviconkit API
      if ($random == 1 or empty($favicon)) {
        $favicon = 'https://api.faviconkit.com/'.$domain.'/16';
      }

      // Favicongrabber API
      if ($random == 2 or empty($favicon)) {
        $echo = json_decode(load('http://favicongrabber.com/api/grab/'.$domain,FALSE),TRUE);

        // Get Favicon URL from Array out of json data (@ if something went wrong)
        $favicon = @$echo['icons']['0']['src'];

      }

      // Google API (check also md5() later)
      if ($random == 3) {
        $favicon = 'http://www.google.com/s2/favicons?domain='.$domain;
      } 

      // FOR DEBUG ONLY
      if($DEBUG=='debug')print('<b style="color:red;">'.$random.'. API</b> #'.@$favicon.'#<br>');

    } // END If nothink works: Get the Favicon from API

    // Write Favicon local
    $filePath = preg_replace('#\/\/#', '/', $directory.'/'.$domain.'.png');

    // If Favicon should be saved
    if ( isset($save) and $save == TRUE ) {

      //  Load Favicon
      $content = load($favicon, $DEBUG);

      // If Google API don't know and deliver a default Favicon (World)
      if ( isset($random) and $random == 3 and 
           md5($content) == '3ca64f83fdcf25135d87e08af65e68c9' ) {
        $domain = 'default'; // so we don't save a default icon for every domain again

        // FOR DEBUG ONLY
        if($DEBUG=='debug')print('<b style="color:red;">Google</b> #use default icon#<br>');

      }

      // Write 
      $fh = @fopen($filePath, 'wb');
      fwrite($fh, $content);
      fclose($fh);

      // FOR DEBUG ONLY
        if($DEBUG=='debug')print('<b style="color:red;">Write-File</b> #'.@$filePath.'#<br>');

    } else {

      // Don't save Favicon local, only return Favicon URL
      $filePath = $favicon;
    }

    } // END If Favicon not already exists local

    // FOR DEBUG ONLY
    if ($DEBUG=='debug') {

    // Load the Favicon from local file
      if ( !function_exists('file_get_contents') ) {
      $fh = @fopen($filePath, 'r');
      while (!feof($fh)) {
        $content .= fread($fh, 128); // Because filesize() will not work on URLS?
      }
      fclose($fh);
    } else {
      $content = file_get_contents($filePath);
    }
      print('<b style="color:red;">Image</b> <img style="width:32px;" 
             src="data:image/png;base64,'.base64_encode($content).'"><hr size="1">');
  }

  // Return Favicon Url
  return $filePath;

} // END MAIN Function

/* HELPER load use curl or file_get_contents (both with user_agent) and fopen/fread as fallback */
function load($url, $DEBUG) {
  if ( function_exists('curl_version') ) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_USERAGENT, 'FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $content = curl_exec($ch);
    if ( $DEBUG=='debug' ) { // FOR DEBUG ONLY
      $http_code = curl_getinfo($ch);
      print('<b style="color:red;">cURL</b> #'.$http_code['http_code'].'#<br>');
    }
    curl_close($ch);
    unset($ch);
  } else {
    $context = array ( 'http' => array (
        'user_agent' => 'FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/)'),
    );
    $context = stream_context_create($context);
      if ( !function_exists('file_get_contents') ) {
      $fh = fopen($url, 'r', FALSE, $context);
      $content = '';
      while (!feof($fh)) {
        $content .= fread($fh, 128); // Because filesize() will not work on URLS?
      }
      fclose($fh);
    } else {
      $content = file_get_contents($url, NULL, $context);
    }
  }
  return $content;
}

/* HELPER: Change URL from relative to absolute */
function rel2abs( $rel, $base ) {
    extract( parse_url( $base ) );
    if ( strpos( $rel,"//" ) === 0 ) return $scheme . ':' . $rel;
    if ( parse_url( $rel, PHP_URL_SCHEME ) != '' ) return $rel;
    if ( $rel[0] == '#' or $rel[0] == '?' ) return $base . $rel;
    $path = preg_replace( '#/[^/]*$#', '', $path);
    if ( $rel[0] ==  '/' ) $path = '';
    $abs = $host . $path . "/" . $rel;
    $abs = preg_replace( "/(\/\.?\/)/", "/", $abs);
    $abs = preg_replace( "/\/(?!\.\.)[^\/]+\/\.\.\//", "/", $abs);
    return $scheme . '://' . $abs;
}

来源:https://github.com/gaffling/PHP-Grab-Favicon

答案 9 :(得分:0)

如果您想从特定网站检索favicon,只需从其网站的根目录中获取favicon.ico即可。像这样:

$domain = "www.example.com";
$url = "http://".$domain."/favicon.ico";
$icondata = file_get_contents($url);

... you can now do what you like with the icon data

答案 10 :(得分:0)

找到这个帖子......我写了一个WordPress插件,其中包含了很多关于检索favicon的变体。由于GPL代码很多:http://plugins.svn.wordpress.org/wp-favicons/trunk/

它允许您运行服务器,您可以通过xml rpc请求请求图标,以便任何客户端都可以请求图标。它确实有一个插件结构,所以你可以尝试谷歌,getfavicon等...看看这些服务之一是否提供任何东西。如果没有,那么它会进入一个图标提取模式,考虑到所有的http状态(301/302/404),并最好在任何地方找到一个图标。在此之后,它使用图像库函数来检查文件内部是否真的是一个图像和什么样的图像(有时扩展名是错误的)并且它是可插入的,因此您可以在管道中添加图像转换或额外功能。

http获取文件围绕我在上面看到的内容做了一些逻辑:http://plugins.svn.wordpress.org/wp-favicons/trunk/includes/server/class-http.php

但它只是管道的一部分。

一旦你潜入它,

会变得非常复杂。

答案 11 :(得分:0)

$url = 'http://thamaraiselvam.strikingly.com/';
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
@$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[@rel="shortcut icon"]');
if (!empty($arr[0]['href'])) {
    echo "<img src=".$arr[0]['href'].">";
 }
else 
echo "<img src='".$url."/favicon.ico'>";

答案 12 :(得分:0)

我改变了一点Vivek second method并添加了this function,它看起来像这样:

<?php
        $website=$_GET['u'];
        $fevicon= getFavicon($website);
        echo '<img src="'.path_to_absolute($fevicon,$website).'"></img>';

            function getFavicon($site)
            {
            $html=file_get_contents($site);
            $dom=new DOMDocument();
            @$dom->loadHTML($html);
            $links=$dom->getElementsByTagName('link');
            $fevicon='';

            for($i=0;$i < $links->length;$i++ )
            {
                $link=$links->item($i);
                if($link->getAttribute('rel')=='icon'||$link->getAttribute('rel')=="Shortcut Icon"||$link->getAttribute('rel')=="shortcut icon")
                {
                    $fevicon=$link->getAttribute('href');
                }
            }
            return  $fevicon;
            }

    // transform to absolute path function... 
    function path_to_absolute($rel, $base)
    {
    /* return if already absolute URL */
    if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
    /* queries and anchors */
    if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
    /* parse base URL and convert to local variables:
       $scheme, $host, $path */
    extract(parse_url($base));
    /* remove non-directory element from path */
    $path = preg_replace('#/[^/]*$#', '', $path);
    /* destroy path if relative url points to root */
    if ($rel[0] == '/') $path = '';
    /* dirty absolute URL */
    $abs = "$host$path/$rel";
    /* replace '//' or '/./' or '/foo/../' with '/' */
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
    /* absolute URL is ready! */
    return $scheme.'://'.$abs;
    }

?>

当然你用https://www.domain.tld/favicon/this_script.php?u=http://www.example.com来称呼它 仍然无法捕捉所有选项,但现在绝对路径已经解决。希望它有所帮助。