Question

我有一个处理rtf文档工作流程的系统。它适用于标准Rtf。我知道正则表达式能够在单词2003中处理它。我希望能够处理2007年的单词。

我的代码如下所示：[[FooBuzz]]。

许多程序，如wordpad，保持[[FooBuzz]]纯文本。 Word 2003爆炸[[来自标签。 Word 2007甚至是最糟糕的，他每次上限都会爆炸。所以Foo Buzz。

我的示例数据：

{ toto}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid5517131 [[}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid2708730 Foo}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid2708730 Buzz}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid5517131 ]]} {toto}

我需要2件事。首先是正则表达式匹配[[FooBuzz]]

的rtf表示

示例： {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid5517131 [[} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid2708730 Foo} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid2708730 Buzz} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid5517131]]}

其次我想选择标签的名称。在这里FooBuzz。我必须使用php函数preg_match_all。

所以这是我的测试数据的测试结果：

Array

（ [0] =＆gt;排列（ [0] =＆gt; {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid5517131 [[} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid2708730 Foo} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid2708730 Buzz} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid5517131]]} [1] =＆gt; {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid5517131 [[} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid2708730 Foo} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid2708730 Buzz} {\ rtlch \ fcs1 \ af0 \ ltrch \ fcs0 \ insrsid5517131]]} ）

[1] => Array
    (
        [0] => {\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid5517131 [[}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid2708730 Foo}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid2708730 Buzz}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid5517131 ]]}
        [1] => {\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid5517131 [[}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid2708730 Foo}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid2708730 Buzz}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid5517131 ]]}
    )

[2] => Array
    (
        [0] => 
        [1] => 
    )

[3] => Array
    (
        [0] => Foo
        [1] => Foo
    )

）

如您所见，它会根据需要生成标签。键1是我稍后会处理的错误。键2作为结果，仅当[[FooBuzz]]未爆炸时。关键3，结果使用word 2003。

所以Foo和Buzz可能在不同的数组中，这对我来说已经足够好了，只要它是有用的。

示例：

[3] => Array
    (
        [0] => Foo

    )
 [4] => Array
    (
        [0] => Buzz

    )

OR

[3] => Array
    (
        [0] => FooBuzz

    )

是接受的答案。

我的正则表达式及其解释：

我收到了stackoverflow的帮助来构建它：

/(\[\[([^\[\]]*?)\]\]|{[^{]*?\[\[.*?(?<=\[\[).+?\b(?<!\\)(\w+)\b(?=.+?\]\]).*?\]\].*?})/

以更有意义的方式：

/(        Begenning of the OR clause
 \[\[([^\[\]]*?)\]\]   Regex used to catch [FooBuzz] in plain text.
 |   Or statement.
 {[^{]*?\[\[.*?(?<=\[\[).+?  Part able to catch  the Rtf translation of [[
   \b(?<!\\)(\w+)\b     This part have a negative look behind. It match rtf metadata (ex \toto123. And i selects Foo
 (?=.+?\]\]).*?\]\].*?} Match the RTF translations of ]]
 )/ End of or statement.

注意：有很多非贪婪的字符（？），这样正则表达式只在需要时选择标记和他的元数据。（以纯文本替换）。

这是遗留代码，我无法决定放弃纯文本方式。性能无关紧要，它是批量运行的。

你会如何抓住FooBuzz？

测试网站：

http://www.spaweditor.com/scripts/regex/index.php显示preg_match_all的输出。

http://rubular.com/r/5fm7afU5vG可以使用Funnier，您可以编辑永久链接。如您所见，匹配的显示方式与目标函数相同。

简而言之：

I want to match all the RTF reprsentation of [[FooBuzz]] with match 1.
I want either match x => FooBuzz or match x => Foo match x + 1 => Buzz, if consistent.

您可以自由添加另一个或。否则我认为要编辑的部分是： \ B（？

Answer 1

使用正则表达式

/{[^{]*\[{2}.*?\b(\w+)}.*?(?:\b(\w+)}.*?)?\]{2}[^}]*}/
                   ↑             ↑
                  Foo          Buzz

PHP代码：

$pattern = '/{[^{]*\[{2}.*?\b(\w+)}.*?(?:\b(\w+)}.*?)?\]{2}[^}]*}/';
preg_match($pattern, $subject, $matches);

测试此代码 here 。

在RTF中捕获整个标记

1 个答案:

PHP代码：