如何从文本字符串中获取URL?

时间:2013-04-27 08:07:52

标签: php string text

我有一个链接URL和其他文本的字符串。我想将所有网址都放到$matches数组中。但是以下代码不会将所有URL都放入$matches数组:

$matches = array();
$text = "soundfly.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL";
preg_match_all('$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i', $text, $matches);
print_r($matches);

以上代码将获得:

http://tinyurl.com/9uxdwc
http://google.com
http://tinyurl.com/787988

但错过了以下4个网址:

schoollife.edu 
hello.net 
news.yahoo.com
en.wikipedia.org/wiki/Country_music

请告诉我一个例子,如何修改上面的代码以获取所有网址

1 个答案:

答案 0 :(得分:1)

这是你需要的吗?

$matches = array();
$text = "soundfly.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL";
preg_match_all('$\b((https?|ftp|file)://)?[-A-Z0-9+&@#/%?=~_|!:,.;]*\.[-A-Z0-9+&@#/%=~_|]+$i', $text, $matches);
print_r($matches);

我使协议部分选项,添加使用分割域和TLD的点和“+”来获得该点后的完整字符串(TLD +额外信息)

结果是:

[0] => soundfly.us 
[1] => schoollife.edu 
[2] => hello.net 
[3] => news.yahoo.com 
[4] => http://tinyurl.com/9uxdwc 
[5] => http://google.com 
[6] => http://tinyurl.com/787988 
[7] => en.wikipedia.org/wiki/Country_music

还可以使用IP地址,因为必须存在点。用字符串“192.168.0.1”和“192.168.0.1/test/index.php”测试