RegEx用于捕获"标题#34;触发textarea中的单词

时间:2014-10-08 16:40:37

标签: php regex preg-split

我正在尝试为php preg_split编写一个正则表达式,以便在textarea im处理中捕获某些“标题”之类的单词。

我想使用生成的数组来改善用户的格式,并在评论帖子中创建简化的外观。

$returnValue = preg_split('/[^|\n]*[\t| ]*\b(Pro|Contra|Conclusion)\b\:[\t| ]*/i', 
                           $data['review_text'],
                           -1,
                           PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);

这是我的示例文字输入

Intro line one, first part of the array
Pro:Pro:double Pro 1, no space between
Pro: Pro:double Pro 2, space between
Pro: test Pro:double Pro 3, characters between
Pro:
Pro:double Pro 4, linebreak betweem, should create an empty pro entry
Contra:
Conclusion: the last Contra was empty
Conclusion: this Contra: in this row should not match!
Conclusion: Test with spaces between Conclusion and :
 Conclusion: this Conclusion was prefixed by a space
    Conclusion: this Conclusion was prefixed by a Tab
        Conclusion: this Conclusion was prefixed by two Tabs a space between
Conclusion : this Conclusion has a space between Conclusion and :



a final line with multiple line breaks in between, should be part of the last conclusion fragment

结果应该由[0]作为简介行,4个Pro结果(带有分隔符),1个Contra(空)和7个结论结果(带有分隔符)组成。唯一的Contra应该是空的,最后一行应该是最后一个结论的一部分

我正在尝试匹配这样的东西

  1. 行首,文件开头
  2. 任何空格字符零或n次出现
  3. 任何版本的Pro,Contra或结论(忽略大写/小写)
  4. 任何空格字符零或n次出现
  5. 按此顺序

2 个答案:

答案 0 :(得分:1)

首先,[^|\n]*表示0个或更多不是管道|或换行符的字符。
[\t| ]*表示0个或多个不是制表符或管道|或空格的字符。

我想你想要:

/\s*\b(Pro|Contra|Conclusion):[\t ]*/i

答案 1 :(得分:0)

在@ M42的帮助下,我能够找到正确的方法......

'/\n[\t ]*\b(Pro|Contra|Conclusion)[\t ]*:[\t ]*/i'

只缺少“文件起点而不是新行”,这几乎完全符合我的要求(尽管仍在测试,但要做出保真)。现在我在字符串之前添加一个“\ r \ n”,当我修剪()字符串片段时,它会被剥去。

完整的PHP调用如下所示

$returnValue = preg_split('/\n[\t ]*\b(Pro|Contra|Conclusion)[\t ]*:[\t ]*/i', $data['review_text'], -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);

万一你想知道为什么我在回复M42时使用Fazit代替结论,我正在为德国网络应用程序编写代码,所以我必须将每个副本和粘贴翻译成StackOverflow。 (ಠ_ಠ)