我正在解析维基百科API,API以下列格式进行响应:
Lorem ipsum dolor sit amet, consectetur adipisicing [[elitaaa|elit]], sed do eiu
smod tempor incididunt ut labore et. Ut [[enim (enimaaddasd)|enima]] ad
minim veniam, [[some realllllly long word|quis]] [[ullamco|test]] laboris
iquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
我的目标是用“单词”替换每个“[[long | word]]”。例如,第一行应如下所示:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiu
我不关心“long”的内容(它可能包含所有字符,空格和()
),我只需要用“word”替换括号中的内容。
我做了以下正则表达式:
$data = preg_replace(
'/\[\[([\s\S])\|(.*?)\]\]/',
'$2', $data);
但有时它匹配整段,例如它取代
adipisicing [[elitaaa|elit]], sed do eiu smod tempor incididunt ut labore et.
Ut [[enim (enimaaddasd)|enima]] ad
使用adipisicing enima ad
..我尝试在{1,20}
之后添加[\s\S]
,但它无效,因为“long”中的内容可以是整个句子或只是单个5个字符
答案 0 :(得分:1)
试试这个正则表达式:
$result = preg_replace('/\[\[[^\]]+\|([^\]]+)\]\]/', '$1', $string);
答案 1 :(得分:1)
这应该适合你。
<?php
$str = <<<STR
Lorem ipsum dolor sit amet, consectetur adipisicing [[elitaaa|elit]], sed do eiu
smod tempor incididunt ut labore et. Ut [[enim (enimaaddasd)|enima]] ad
minim veniam, [[some realllllly long word|quis]] [[ullamco|test]] laboris
iquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
STR;
$res = preg_replace('/\[+[^\]]+\|([^\]]+)\]+/', '$1', $str);
echo $res;
?>
正则表达式:
\[+ match '[' (1 or more times)
[^\]]+ any character except: '\]' (1 or more times)
\| match literal '|'
( group and capture to \1:
[^\]]+ any character except: '\]' (1 or more times)
) end of \1
\]+ match ']' (1 or more times)
输出:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiu
smod tempor incididunt ut labore et. Ut enima ad
minim veniam, quis test laboris
iquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
答案 2 :(得分:0)
我很无聊所以我给了一个程序性的例子。如果不适合您,那么找到此页面的其他人可能会感兴趣。
不能保证它没有错误(但它适用于示例字符串),我也想处理未关闭/打开的标签,但我必须运行。
$s = 'First, we begin with a [[single]] word. Next, we use an [[pseudonym|alias]]. ';
$s.= "And then a [[tag with\na newline]] in it. That [[is]] it!";
function wiki_parse($input) {
$output = '';
$offset = 0;
while (true) {
$open = mb_strpos($input, '[[', $offset);
$close = mb_strpos($input, ']]', $offset);
if ($open === false or $close === false)
break;
if ($open > $offset)
$output .= mb_substr($input, $offset, $open-$offset);
$output .= wiki_parse_token($input, $open, $close);
$offset = $close+2;
}
if ($offset < mb_strlen($input))
$output .= mb_substr($input, $offset);
return $output;
}
function wiki_parse_token($input, $open, $close) {
$token = mb_substr($input, $open+2, ($close-$open)-2);
if (mb_strpos($token, "\n") !== false) {
$token = "[[$token]]";
} else {
$sep = mb_strpos($token, '|');
if ($sep !== false) {
$token = mb_substr($token, $sep+1);
}
}
return $token;
}
var_dump($s);
var_dump(wiki_parse($s));
die;