Question

我有一些这样的文字：

$text = "Some thing is there http://example.com/جميع-وظائف-فى-السليمانية 
         http://www.example.com/جميع-وظائف-فى-السليمانية nothing is there
         Check me http://example.com/test/for_me first
         testing http://www.example.com/test/for_me the url 
         Should be test http://www.example.com/翻译-英语教师-中文教师-外贸跟单
         simple text";

我需要preg_match网址，但它们的语言不同所以，我需要从每一行获取URL本身。

我这样做：

$text = preg_replace("/[\n]/", " <br>", $text);
$lines = explode("<br>", $text);
foreach($line as $textLine){
   if (preg_match("/(http\:\/\/(.*))/", $textLine, $match )) {
     // some code
     // Here I need the url
   }
}

我目前的正则表达式是/(http\:\/\/(.*))/，请建议我如何使其与不同语言的网址兼容？

Answer 1

这样的正则表达式对你有用吗？在我的测试中，它使用了您提供的文本示例，但它不是很先进。只需选择http://或https://之后的所有字符，直到出现空白字符（空格，新行，制表符等）。

/(https?\:\/\/(?:[^\s]+))/gi

Regular expression visualization

以下是您的示例字符串匹配内容的可视示例：
http://regex101.com/r/bR0yE9

Answer 2

您不需要逐行工作，您可以直接搜索：

if (preg_match_all('~\bhttp://\S+~', $text, $matches))
     print_r($matches);

\S表示＆＃34;所有不是白色字符＆＃34;。
没有特殊的内化问题。

注意：如果您想在<br/>之后替换所有换行符，建议您使用$text = preg_replace('~\R~', '<br/>', $text);，因为\R仅在\n匹配时处理多种类型的换行符unix换行符。

Preg_match用于不同的语言URL

2 个答案: