从文本中提取复杂URL

时间:2015-10-16 20:45:28

标签: php regex

该文字包含以下网址 https://www.yyyy.com/blablabla/https://www.foofoofoofoofoo/loremlorem/lorem/https:www.textext.net/

他们相邻。正则表达式没有帮助。我想像那样解决它; 搜索https://www 如果匹配则提取(仅前10个字符)到数组。

2 个答案:

答案 0 :(得分:0)

解决方案可能是:

<?php
$str = "https://www.yyyy.com/blablabla/https://www.foofoofoofoofoo/loremlorem/lorem/https:www.textext.net/";
    //add an space to explode it easily:     
     $my_str = preg_replace("*https:*", " https:", $str);
     $values = explode(' ', $my_str);    
     var_dump($values);
?>

修改

<?php
         //First separate the url string:
 $str = "https://www.yyyy.com/blablabla/https://www.foofoofoofoofoo/loremlorem/lorem/https:www.textext.net/https://youtube.com/channels/uniqueID/about/foofoofoo/foo";
 $breakpoint = "https:";
 //add an space to explode it easily:    
 $my_str = preg_replace("*" . $breakpoint . "/?/?*", " ", $str);
 $values = explode(' ', $my_str);    
 var_dump($values);

 //Now, foreach url you can perform whatever you want:
 $end = "about/";
 $a = array();
 foreach($values as $value){
    if( preg_match("*" . $end . "*",$value) ){
        //split string in parts:
        $val = preg_split("*" . $end . "*",$value);
        $a[] = $val[0];
    }
 }

var_dump($a);
?>

答案 1 :(得分:0)

根据您作为样本提供的文字,我认为preg_split是您最好的选择:

$urls = preg_split('/(http){1}s?\:(\/\/)?/i', $text);

$ urls将是您所需的分割网址数组。在几个全文和附录中测试它告诉我们