从文本块中提取和删除URL

时间:2012-11-09 14:18:55

标签: php regex preg-replace

我有这个文本块:

$text = 'This just happened outside the store http://somedomain.com/2012/12/store there might be more text afterwards...';

需要转换为:

$result['text_1'] = 'This just happened outside the store';
$result['text_2'] = 'there might be more text afterwards...';
$result['url'] = 'http://somedomain.com/2012/12/store';

这是我当前的代码,它确实检测到了url,但我只能从文本中删除它,我仍然需要在数组中单独分配url值:

$string = preg_replace('/https?:\/\/[^\s"<>]+/', '', $text);
//returns "This just happened outside the store  there might be more text afterwards..."

有什么想法吗?谢谢!

时间解决方案(这可以优化吗?)

$text = 'This just happened outside the store http://somedomain.com/2012/12/store There might be more text afterwards...';
preg_match('/https?:\/\/[^\s"<>]+/',$text,$url);
$string = preg_split('/https?:\/\/[^\s"<>]+/', $text);
$text = preg_replace('/\s\s+/','. ',implode(' ',$string));
echo '<a href="'.$url[0].'">'.$text.'</a>';

2 个答案:

答案 0 :(得分:2)

你需要它存储在一个变量中还是只需要在ahref里面? 怎么样?

<?php
$text = 'This just happened outside the store http://somedomain.com/2012/12/store There might be more text afterwards...';
$pattern = '@(.*?)(https?://.*?) (.*)@';
$ret = preg_replace( $pattern, '<a href="$2">$3</a>', $text );
var_dump( $ret );

$ 1,$ 2和$ 3对应于第1,第2,第3个括号

输出将是

<a href="http://somedomain.com/2012/12/store">There might be more text afterwards...</a>

答案 1 :(得分:1)

您可以使用preg_split在正则表达式中拆分字符串以提供数组

$result = preg_split('/(https?:\/\/[^\s"<>]+)/', $the_string, -1, PREG_SPLIT_DELIM_CAPTURE);
// $result[0] = preamble
// $result[1] = url
// $result[2] = possible afters