twitter正则表达式破解,思考语法问题

时间:2011-12-06 15:51:15

标签: php regex json twitter

我的正则表达式的语法打破了< a>以某种方式链接“rel =”行。

这是:

<?php   
function parseTweet($text) {
   $pattern_url = '~(?>[a-z+]{2,}://|www\.)(?:[a-z0-9]+(?:\.[a-z0-9]+)?@)?(?:(?:[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])(?:\.[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])+|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?:/[^\\/:?*"|\n]*[a-z0-9])*/?(?:\?[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?(?:&[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?)*)?(?:#[a-z0-9_%.]+)?~i';
    '@([A-Za-z0-9_]+)';

   $tweet = preg_replace('/(^|\s)#(\w+)/', '\1#<a href="http://search.twitter.com/search?q=%23\2? rel="nofollow">\2</a>', $text);
   $tweet = preg_replace('/(^|\s)@(\w+)/', '\1@<a href="http://www.twitter.com/\2? rel="nofollow">\2</a>', $tweet);
   $tweet = preg_replace('#(^|[\n ])(([\w]+?://[\w\#$%&~.\-;:=,?@\[\]+]*)(/[\w\#$%&~/.\-;:=,?@\[\]+]*)?)#is', '\\1
                      <a href=\"\\2\" title=\"\\2\" rel=\"nofollow\">[link]</a>', $tweet);
   return $tweet;
}

$username='stephenfry'; // set user name
$format='json'; // set format
$tweet=json_decode(file_get_contents("http://api.twitter.com/1/statuses/user_timeline/{$username}.{$format}")); // get tweets and decode them into a variable

$theTweet = parseTweet($tweet[0]->text);

echo $theTweet; 
?>   

链接已解析的HTML:

Great deal: Jot by Adonit, a precise capacitive touch stylus, today 15% off with coupon code: 'Jot' -
<a rel="\"nofollow\"" title="\"http://t.co/QvFi6CKK\"" href="\"http://t.co/QvFi6CKK\"">[link]</a>

哈希标记解析HTML:

I'm so sorry - that last #
<a nofollow"="" href="http://search.twitter.com/search?q=%23GameOfShadowsUK? rel=">GameOfShadowsUK</a>
tweet should hav 3been sent at 2:21 - my f****d up arsing w**k-mess of a life disallowed it :-( 

将狡猾的代码分类并使用更好的方法。见答案。

1 个答案:

答案 0 :(得分:0)

            <?php

            function getLastXTwitterStatus($userid,$x){
            $url = "http://twitter.com/statuses/user_timeline/$userid.xml?count=$x";

            $xml = simplexml_load_file($url) or die('could not connect');
                echo '<ul>';
                   foreach($xml->status as $status){
                   $text = twitterify( $status->text );
                   echo '<li>'.utf8_decode($text).'</li>';
                   }
                echo '</ul>';
             }

             function twitterify($ret) {
              $ret = preg_replace("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t< ]*)#", "\\1<a href=\"\\2\" >\\2</a>", $ret);
              $ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ \"\t\n\r< ]*)#", "\\1<a href=\"http://\\2\" >\\2</a>", $ret);
              $ret = preg_replace("/@(\w+)/", "<a href=\"http://www.twitter.com/\\1\" >@\\1</a>", $ret);
              $ret = preg_replace("/#(\w+)/", "<a href=\"http://search.twitter.com/search?q=\\1\" >#\\1</a>", $ret);
            return $ret;
            }

            //my user id kenrick1991
            getLastXTwitterStatus('simonpegg',1);

            ?>