从推文中过滤和处理网址

时间:2014-01-06 07:44:37

标签: php url twitter

我正在处理推文并从推文中收集网址。

  1. 如果url代表twitter(即以t.comtwitter.com开头),则跳过它
  2. 如果推文中的网址是短网址,那么我会将其转换为长网址。
  3. CODE:

            if(preg_match($reg_exUrl, $tweet, $url)) {
                    preg_match_all($reg_exUrl, $tweet, $urls);
                    foreach ($urls[0] as $url) {
                    echo "Tiny url :  {$url}<br>";
                    $full = MyURLDecode($url);
                    echo "Full url : $full<br>";
                    if (strpos($full, '//t.co') === true)                   
                        continue;   
                    if (strpos($full, '//twitter.com') === true)                    
                    continue;
                    else if (strpos($full, '//bit.ly') !== true)                    
                        $full = MyURLDecode($full);
                    $url_count = get_twitter_url_count($full);
                    echo "Url count: $url_count";               
                    //echo "Numbers of tweets containing this link : ", $code['count'];
                    echo "<br>";
                    }
                } else {
                echo "Mismatch<br>";        
        }           
    function MyURLDecode($url)     
        {    
            $ch = @curl_init($url);    
            @curl_setopt($ch, CURLOPT_HEADER, TRUE);    
            @curl_setopt($ch, CURLOPT_NOBODY, TRUE);    
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);    
            @curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);    
            $url_resp = @curl_exec($ch);    
            preg_match('/Location:\s+(.*)\n/i', $url_resp, $i);    
            if (!isset($i[1]))    
            {
    
            return $url;    
            }    
            return $i[1];    
        } 
    
     function get_twitter_url_count($url) {    
                $encoded_url = urlencode($url);    
                $content = @file_get_contents('http://urls.api.twitter.com/1/urls/count.json?url=' . $encoded_url);    
                return $content ? json_decode($content)->count : 0;   
            }
    

    问题是:

    1. 不会跳过Twitter网址
    2. 某些案例长网址又是短网址,需要转换为长网址。但它失败了

1 个答案:

答案 0 :(得分:1)

对于#1,strpos将返回找到的文本的起始位置,而不会=== true,因此您需要进行测试,例如:

strpos($full, '//t.co') !== false

对于#2,尝试在while循环中调用MyURLDecode(),例如:

$previous = $full;
while (($full = MyURLDecode($full)) != $previous) {
    $previous = $full;
}