Question

我从维基百科APi收到字符串，如下所示：

{{Wikibooks|Wikijunior:Countries A-Z|France}} {{Sister project links|France}} * [http://www.bbc.co.uk/news/world-europe-17298730 France] from the [[BBC News]] * [http://ucblibraries.colorado.edu/govpubs/for/france.htm France] at ''UCB Libraries GovPubs'' *{{dmoz|Regional/Europe/France}} * [http://www.britannica.com/EBchecked/topic/215768/France France] ''EncyclopÃ¦dia Britannica'' entry * [http://europa.eu/about-eu/countries/member-countries/france/index_en.htm France] at the [[European Union|EU]] *{{Wikiatlas|France}} *{{osmrelation-inline|1403916}} * [http://www.ifs.du.edu/ifs/frm_CountryProfile.aspx?Country=FR Key Development Forecasts for France] from [[International Futures]] ;Economy *{{INSEE|National Institute of Statistics and Economic Studies}} * [http://stats.oecd.org/Index.aspx?QueryId=14594 OECD France statistics]

我必须同时使用实际的网址和网址的说明。例如，对于来自[[BBC新闻]]的[http://www.bbc.co.uk/news/world-europe-17298730法国] 我需要＆＃34; http://www.bbc.co.uk/news/world-europe-17298730＆＃34;以及[[BBC新闻]]＆＃34;＆＃34;法国]但没有括号，就像BBC新闻＆＃34;中的法国一样。

通过执行以下操作，我设法获得了第一部分：

if(preg_match_all('/\[http(.*?)\s/',$result,$extmatch)) {           
   $mt= str_replace("[[","",$extmatch[1]);

但是我不知道如何绕过第二部分（不幸的是，我在正则表达式上非常弱:-(）。

有什么想法吗？

Answer 1

<强> PHP：

$input = "{{Wikibooks|Wikijunior:Countries A-Z|France}} {{Sister project links|France}} * [http://www.bbc.co.uk/news/world-europe-17298730 France] from the [[BBC News]] * [http://ucblibraries.colorado.edu/govpubs/for/france.htm France] at ''UCB Libraries GovPubs'' *{{dmoz|Regional/Europe/France}} * [http://www.britannica.com/EBchecked/topic/215768/France France] ''EncyclopÃ¦dia Britannica'' entry * [http://europa.eu/about-eu/countries/member-countries/france/index_en.htm France] at the [[European Union|EU]] *{{Wikiatlas|France}} *{{osmrelation-inline|1403916}} * [http://www.ifs.du.edu/ifs/frm_CountryProfile.aspx?Country=FR Key Development Forecasts for France] from [[International Futures]] ;Economy *{{INSEE|National Institute of Statistics and Economic Studies}} * [http://stats.oecd.org/Index.aspx?QueryId=14594 OECD France statistics]";
$regex = '/\[(http\S+)\s+([^\]]+)\](?:\s+from(?:\s+the)?\s+\[\[(.*?)\]\])?/';

preg_match_all($regex, $input, $matches, PREG_SET_ORDER);
var_dump($matches);

<强>输出：

array(6) {
  [0]=>
  array(4) {
    [0]=>
    string(78) "[http://www.bbc.co.uk/news/world-europe-17298730 France] from the [[BBC News]]"
    [1]=>
    string(47) "http://www.bbc.co.uk/news/world-europe-17298730"
    [2]=>
    string(6) "France"
    [3]=>
    string(8) "BBC News"
  }
  ...
  ...
  ...
  ...
  ...
}

<强>解释

\[       (?# match [ literally)
(        (?# start capture group)
  http   (?# match http literally)
  \S+    (?# match 1+ non-whitespace characters)
)        (?# end capture group)
\s+      (?# match 1+ whitespace characters)
(        (?# start capture group)
  [^\]]+ (?# match 1+ non-] characters)
)        (?# end capture group)
\]       (?# match ] literally)
(?:      (?# start non-capturing group)
  \s+    (?# match 1+ whitespace characters)
  from   (?# match from literally)
  (?:    (?# start non-capturing group)
    \s+  (?# match 1+ whitespace characters)
    the  (?# match the literally)
  )?     (?# end optional non-capturing group)
  \s+    (?# match 1+ whitespace characters)
  \[\[   (?# match [[ literally)
  (      (?# start capturing group)
    .*?  (?# lazily match 0+ characters)
  )      (?# end capturing group)
  \]\]   (?# match ]] literally)
)?       (?# end optional non-caputring group)

如果您需要更全面的解释，请告诉我，但我上面的评论应该有所帮助。如果您有任何具体问题，我非常乐意提供帮助。下面的链接将帮助您可视化表达式正在做什么。

Regex101

Answer 2

不使用正则表达式的解决方案：

在＆＃39; *＆＃39;
从＆＃39; {＆＃39;;
删除所有括号
在＆＃39;空间＆＃39;
第一部分是链接
将其余部分粘合在一起以获取描述

代码：

$parts=explode('*',$str);
$links=array();
foreach($parts as $k=>$v){
    $parts[$k]=ltrim($v);
    if(substr($parts[$k],0,1)!=='['){
        unset($parts[$k]);
        continue;
        }
    $parts[$k]=preg_replace('/\[|\]/','',$parts[$k]);
    $subparts=explode(' ',$parts[$k]);
    $links[$k][0]=$subparts[0];
        unset($subparts[0]);
    $links[$k][1]=implode(' ',$subparts);
    }

echo '<pre>'.print_r($links,true).'</pre>';

结果：

Array
(
    [1] => Array
        (
            [0] => http://www.bbc.co.uk/news/world-europe-17298730
            [1] => France from the BBC News 
        )

    [2] => Array
        (
            [0] => http://ucblibraries.colorado.edu/govpubs/for/france.htm
            [1] => France at ''UCB Libraries GovPubs'' 
        )

    [4] => Array
        (
            [0] => http://www.britannica.com/EBchecked/topic/215768/France
            [1] => France ''EncyclopÃ¦dia Britannica'' entry 
        )

    [5] => Array
        (
            [0] => http://europa.eu/about-eu/countries/member-countries/france/index_en.htm
            [1] => France at the European Union|EU 
        )

    [8] => Array
        (
            [0] => http://www.ifs.du.edu/ifs/frm_CountryProfile.aspx?Country=FR
            [1] => Key Development Forecasts for France from International Futures ;Economy 
        )

    [10] => Array
        (
            [0] => http://stats.oecd.org/Index.aspx?QueryId=14594
            [1] => OECD France statistics 
        )

)

在太空之后获得一部分字符串

2 个答案: