PHP:如何使用Twitter API数据将推文中的URL,提及和主题标签转换为链接?

时间:2012-07-18 01:48:07

标签: php api twitter

我真的不知道Twitter如何期望其API用户将其发送的明文推文转换为正确链接的HTML。

这是交易:Twitter的JSON API在您请求推文的详细数据时会发回这组信息:

{
    "created_at":"Wed Jul 18 01:03:31 +0000 2012",
    "id":225395341250412544,
    "id_str":"225395341250412544",
    "text":"This is a test tweet. #boring @nbc http://t.co/LUfDreY6 #skronk @crux http://t.co/VpuMlaDs @twitter",
    "source":"web",
    "truncated":false,
    "in_reply_to_status_id":null,
    "in_reply_to_status_id_str":null,
    "in_reply_to_user_id":null,
    "in_reply_to_user_id_str":null,
    "in_reply_to_screen_name":null,
    "user": <REDACTED>,
    "geo":null,
    "coordinates":null,
    "place":null,
    "contributors":null,
    "retweet_count":0,
    "entities":{
        "hashtags":[
            {
                "text":"boring",
                "indices":[22,29]
            },
            {
                "text":"skronk",
                "indices":[56,63]
            }
        ],
        "urls":[
            {
                "url":"http://t.co/LUfDreY6",
                "expanded_url":"http://www.twitter.com",
                "display_url":"twitter.com",
                "indices":[35,55]
            },
            {
                "url":"http://t.co/VpuMlaDs",
                "expanded_url":"http://www.example.com",
                "display_url":"example.com",
                "indices":[70,90]
            }
        ],
        "user_mentions":[
            {
                "screen_name":"nbc",
                "name":"NBC",
                "id":26585095,
                "id_str":"26585095",
                "indices":[30,34]
            },
            {
                "screen_name":"crux",
                "name":"Z. D. Smith",
                "id":407213,
                "id_str":"407213",
                "indices":[64,69]
            },
            {
                "screen_name":"twitter",
                "name":"Twitter",
                "id":783214,
                "id_str":"783214",
                "indices":[91,99]
            }
        ]
    },
    "favorited":false,
    "retweeted":false,
    "possibly_sensitive":false
}

此问题的有趣部分是text元素以及hashtagsuser_mentionsurls数组中的条目。 Twitter告诉我们在text元素中hstags,mentions和url在indices数组中出现的位置......所以这里是问题的症结所在:

你如何使用那些indices数组?

您不能直接使用substr_replace之类的链接元素循环使用它们,因为替换text中的第一个链接元素将使后续链接元素的所有索引值无效。你也不能使用substr_replace的数组功能,因为只有当你为第一个arg而不是单个字符串给它一个字符串数组时它才有效(我已经测试了这个。结果是...奇怪的。。

是否有一些函数可以同时用不同的替换字符串替换多个索引分隔的子字符串?

6 个答案:

答案 0 :(得分:16)

你需要做的就是使用twitter提供的索引直接进行简单的替换就是收集你想做的替换,然后对它们进行反向排序。你可能会找到一种更聪明的方法来构建$实体,我还是希望它们是可选的,所以我尽可能地吻。

无论哪种方式,我的观点只是为了表明你不需要爆炸字符串和字符数等等。无论你如何做,你需要做的就是从结束开始,然后工作到字符串的开头,而twitter的索引仍然有效。

<?php 

function json_tweet_text_to_HTML($tweet, $links=true, $users=true, $hashtags=true)
{
    $return = $tweet->text;

    $entities = array();

    if($links && is_array($tweet->entities->urls))
    {
        foreach($tweet->entities->urls as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = "<a href='".$e->expanded_url."' target='_blank'>".$e->display_url."</a>";
            $entities[] = $temp;
        }
    }
    if($users && is_array($tweet->entities->user_mentions))
    {
        foreach($tweet->entities->user_mentions as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = "<a href='https://twitter.com/".$e->screen_name."' target='_blank'>@".$e->screen_name."</a>";
            $entities[] = $temp;
        }
    }
    if($hashtags && is_array($tweet->entities->hashtags))
    {
        foreach($tweet->entities->hashtags as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = "<a href='https://twitter.com/hashtag/".$e->text."?src=hash' target='_blank'>#".$e->text."</a>";
            $entities[] = $temp;
        }
    }

    usort($entities, function($a,$b){return($b["start"]-$a["start"]);});


    foreach($entities as $item)
    {
        $return = substr_replace($return, $item["replacement"], $item["start"], $item["end"] - $item["start"]);
    }

    return($return);
}


?>

答案 1 :(得分:13)

好的,所以我需要做到这一点,我解决了它。这是我写的功能。 https://gist.github.com/3337428

function parse_message( &$tweet ) {
    if ( !empty($tweet['entities']) ) {
        $replace_index = array();
        $append = array();
        $text = $tweet['text'];
        foreach ($tweet['entities'] as $area => $items) {
            $prefix = false;
            $display = false;
            switch ( $area ) {
                case 'hashtags':
                    $find   = 'text';
                    $prefix = '#';
                    $url    = 'https://twitter.com/search/?src=hash&q=%23';
                    break;
                case 'user_mentions':
                    $find   = 'screen_name';
                    $prefix = '@';
                    $url    = 'https://twitter.com/';
                    break;
                case 'media':
                    $display = 'media_url_https';
                    $href    = 'media_url_https';
                    $size    = 'small';
                    break;
                case 'urls':
                    $find    = 'url';
                    $display = 'display_url';
                    $url     = "expanded_url";
                    break;
                default: break;
            }
            foreach ($items as $item) {
                if ( $area == 'media' ) {
                    // We can display images at the end of the tweet but sizing needs to added all the way to the top.
                    // $append[$item->$display] = "<img src=\"{$item->$href}:$size\" />";
                }else{
                    $msg     = $display ? $prefix.$item->$display : $prefix.$item->$find;
                    $replace = $prefix.$item->$find;
                    $href    = isset($item->$url) ? $item->$url : $url;
                    if (!(strpos($href, 'http') === 0)) $href = "http://".$href;
                    if ( $prefix ) $href .= $item->$find;
                    $with = "<a href=\"$href\">$msg</a>";
                    $replace_index[$replace] = $with;
                }
            }
        }
        foreach ($replace_index as $replace => $with) $tweet['text'] = str_replace($replace,$with,$tweet['text']);
        foreach ($append as $add) $tweet['text'] .= $add;
    }
}

答案 2 :(得分:7)

这是一个边缘情况,但在Styledev的答案中使用str_replace()可能会导致问题,如果一个实体包含在另一个实体中。例如,“我是天才!#me #mensa”可能变成“我是天才!#me #me nsa”如果先替换较短的实体。

此解决方案避免了这个问题:

<?php
/**
 * Hyperlinks hashtags, twitter names, and urls within the text of a tweet
 * 
 * @param object $apiResponseTweetObject A json_decoded() one of these: https://dev.twitter.com/docs/platform-objects/tweets
 * @return string The tweet's text with hyperlinks added
 */
function linkEntitiesWithinText($apiResponseTweetObject) {

    // Convert tweet text to array of one-character strings
    // $characters = str_split($apiResponseTweetObject->text);
    $characters = preg_split('//u', $apiResponseTweetObject->text, null, PREG_SPLIT_NO_EMPTY);

    // Insert starting and closing link tags at indices...

    // ... for @user_mentions
    foreach ($apiResponseTweetObject->entities->user_mentions as $entity) {
        $link = "https://twitter.com/" . $entity->screen_name;          
        $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
        $characters[$entity->indices[1] - 1] .= "</a>";         
    }               

    // ... for #hashtags
    foreach ($apiResponseTweetObject->entities->hashtags as $entity) {
        $link = "https://twitter.com/search?q=%23" . $entity->text;         
        $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
        $characters[$entity->indices[1] - 1] .= "</a>";         
    }

    // ... for http://urls
    foreach ($apiResponseTweetObject->entities->urls as $entity) {
        $link = $entity->expanded_url;          
        $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
        $characters[$entity->indices[1] - 1] .= "</a>";         
    }

    // ... for media
    foreach ($apiResponseTweetObject->entities->media as $entity) {
        $link = $entity->expanded_url;          
        $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
        $characters[$entity->indices[1] - 1] .= "</a>";         
    }

    // Convert array back to string
    return implode('', $characters);

}
?>  

答案 3 :(得分:6)

Jeff的解决方案与英文文本配合得很好,但是当推文包含非ASCII字符时,它已经破解了。这个解决方案避免了这个问题:

mb_internal_encoding("UTF-8");

// Return hyperlinked tweet text from json_decoded status object:
function MakeStatusLinks($status) 
{$TextLength=mb_strlen($status['text']); // Number of UTF-8 characters in plain tweet.
 for ($i=0;$i<$TextLength;$i++)
 {$ch=mb_substr($status['text'],$i,1); if ($ch<>"\n") $ChAr[]=$ch; else $ChAr[]="\n<br/>"; // Keep new lines in HTML tweet.
 }
if (isset($status['entities']['user_mentions']))
 foreach ($status['entities']['user_mentions'] as $entity)
 {$ChAr[$entity['indices'][0]] = "<a href='https://twitter.com/".$entity['screen_name']."'>".$ChAr[$entity['indices'][0]];
  $ChAr[$entity['indices'][1]-1].="</a>";
 }
if (isset($status['entities']['hashtags']))
 foreach ($status['entities']['hashtags'] as $entity)
 {$ChAr[$entity['indices'][0]] = "<a href='https://twitter.com/search?q=%23".$entity['text']."'>".$ChAr[$entity['indices'][0]];
  $ChAr[$entity['indices'][1]-1] .= "</a>";
 }
if (isset($status['entities']['urls']))
 foreach ($status['entities']['urls'] as $entity)
 {$ChAr[$entity['indices'][0]] = "<a href='".$entity['expanded_url']."'>".$entity['display_url']."</a>";
  for ($i=$entity['indices'][0]+1;$i<$entity['indices'][1];$i++) $ChAr[$i]='';
 }
if (isset($status['entities']['media']))
 foreach ($status['entities']['media'] as $entity)
 {$ChAr[$entity['indices'][0]] = "<a href='".$entity['expanded_url']."'>".$entity['display_url']."</a>";
  for ($i=$entity['indices'][0]+1;$i<$entity['indices'][1];$i++) $ChAr[$i]='';
 }
return implode('', $ChAr); // HTML tweet.
}

答案 4 :(得分:1)

这是一个更新的答案,适用于Twitter的新扩展模式。它结合了@ vita10gy的答案和@Hugo的评论(使其与utf8兼容),并通过一些小的调整来处理新的api值。

function utf8_substr_replace($original, $replacement, $position, $length) {
    $startString = mb_substr($original, 0, $position, "UTF-8");
    $endString = mb_substr($original, $position + $length, mb_strlen($original), "UTF-8");
    $out = $startString . $replacement . $endString;
    return $out;
}

function json_tweet_text_to_HTML($tweet, $links=true, $users=true, $hashtags=true) {
    // Media urls can show up on the end of the full_text tweet, but twitter doesn't index that url. 
    // The display_text_range indexes show the actual tweet text length.
    // Cut the string off at the end to get rid of this unindexed url.
    $return = mb_substr($tweet->full_text, $tweet->display_text_range[0],$tweet->display_text_range[1]);
    $entities = array();

    if($links && is_array($tweet->entities->urls))
    {
        foreach($tweet->entities->urls as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = " <a href='".$e->expanded_url."' target='_blank'>".$e->display_url."</a>";
            $entities[] = $temp;
        }
    }
    if($users && is_array($tweet->entities->user_mentions))
    {
        foreach($tweet->entities->user_mentions as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = " <a href='https://twitter.com/".$e->screen_name."' target='_blank'>@".$e->screen_name."</a>";
            $entities[] = $temp;
        }
    }
    if($hashtags && is_array($tweet->entities->hashtags))
    {
        foreach($tweet->entities->hashtags as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = " <a href='https://twitter.com/hashtag/".$e->text."?src=hash' target='_blank'>#".$e->text."</a>";
            $entities[] = $temp;
        }
    }

    usort($entities, function($a,$b){return($b["start"]-$a["start"]);});


    foreach($entities as $item)
    {
        $return =  utf8_substr_replace($return, $item["replacement"], $item["start"], $item["end"] - $item["start"]);
    }

    return($return);
}

答案 5 :(得分:0)

关于vita10gy的有用json_tweet_text_to_HTML(),我发现了一条无法正确格式化的推文:626125868247552000。

这条推文中有一个不间断的空间。我的解决方案是用以下代码替换函数的第一行:

$return = str_replace("\xC2\xA0", ' ', $tweet->text);

str_replace()上执行&nbsp;here