正则表达式 - 在推文中查找所有链接

时间:2009-09-13 00:36:27

标签: ruby-on-rails regex

我的正则表现很差,让我失望所以这里的帮助会很棒。

我想做的就是返回推文中出现的所有链接(只是一个字符串) - 例如:

"Great summary http://mytest.com/blog/post.html (#test)

"http://mytest.com/blog/post.html (#test)

"post: http://mytest.com/blog/post.html"

它还应该支持多个链接,例如: "read http://mytest.com/blog/post.html and http://mytest.com/blog/post_two.html"

任何帮助都会很棒!

由于

4 个答案:

答案 0 :(得分:2)

试试这个:

/\bhttps?:\/\/\S+\b/

更新

捕捉以“www。”开头的链接。也是(没有“http://”前缀),你可以试试这个:

/\b(?:https?:\/\/|www\.)\S+\b/

答案 1 :(得分:1)

以下是我编写的解析推文Feed的网站的代码段。它解析链接,哈希标记和Twitter用户名。到目前为止,它运作良好。我知道这不是Ruby,但正则表达式应该会有所帮助。

if(tweetStream[i] != null)
                    {
                        var str = tweetStream[i].Text;
                        var re = new Regex(@"http(s)?:\/\/\S+");
                        MatchCollection mc = re.Matches(tweetStream[i].Text);

                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='" + m.Value + "' target='_blank'>" + m.Value + "</a>");
                        }
                        re = new Regex(@"(@)(\w+)");
                        mc = re.Matches(tweetStream[i].Text);
                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='http://twitter.com/" + m.Value.Replace("@",string.Empty) + "' target='_blank'>" + m.Value + "</a>");
                        }
                        re = new Regex(@"(#)(\w+)");
                        mc = re.Matches(tweetStream[i].Text);
                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='http://twitter.com/#search?q=" + m.Value.Replace("#", "%23") + "' target='_blank'>" + m.Value + "</a>");
                        }
                        tweets += string1 + "<div>" + str + "</div>" + string2;
                    }

答案 2 :(得分:1)

找到这个here

^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~/|/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$ 

答案 3 :(得分:0)

我意识到这个问题来自2009年,但Twitter的API现在返回URL(并扩展了t.co链接)。