Question

我正在处理推文并从推文中收集网址。

如果url代表twitter（即以t.com或twitter.com开头），则跳过它
如果推文中的网址是短网址，那么我会将其转换为长网址。

CODE：

        if(preg_match($reg_exUrl, $tweet, $url)) {
                preg_match_all($reg_exUrl, $tweet, $urls);
                foreach ($urls[0] as $url) {
                echo "Tiny url :  {$url}<br>";
                $full = MyURLDecode($url);
                echo "Full url : $full<br>";
                if (strpos($full, '//t.co') === true)                   
                    continue;   
                if (strpos($full, '//twitter.com') === true)                    
                continue;
                else if (strpos($full, '//bit.ly') !== true)                    
                    $full = MyURLDecode($full);
                $url_count = get_twitter_url_count($full);
                echo "Url count: $url_count";               
                //echo "Numbers of tweets containing this link : ", $code['count'];
                echo "<br>";
                }
            } else {
            echo "Mismatch<br>";        
    }           
function MyURLDecode($url)     
    {    
        $ch = @curl_init($url);    
        @curl_setopt($ch, CURLOPT_HEADER, TRUE);    
        @curl_setopt($ch, CURLOPT_NOBODY, TRUE);    
        @curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);    
        @curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);    
        $url_resp = @curl_exec($ch);    
        preg_match('/Location:\s+(.*)\n/i', $url_resp, $i);    
        if (!isset($i[1]))    
        {

        return $url;    
        }    
        return $i[1];    
    } 

 function get_twitter_url_count($url) {    
            $encoded_url = urlencode($url);    
            $content = @file_get_contents('http://urls.api.twitter.com/1/urls/count.json?url=' . $encoded_url);    
            return $content ? json_decode($content)->count : 0;   
        }

问题是：

不会跳过Twitter网址
某些案例长网址又是短网址，需要转换为长网址。但它失败了

Answer 1

对于＃1，strpos将返回找到的文本的起始位置，而不会=== true，因此您需要进行测试，例如：

strpos($full, '//t.co') !== false

对于＃2，尝试在while循环中调用MyURLDecode（），例如：

$previous = $full;
while (($full = MyURLDecode($full)) != $previous) {
    $previous = $full;
}

从推文中过滤和处理网址

1 个答案: