根据3个连续(符合条件)单词

时间:2015-05-21 01:30:04

标签: php regex

我有一大堆文字,如下:

  第五十九届广场酒店对面的胜利天使   街。我答应提醒持怀疑态度的哈里森   工作的美德,但我们发现它现在隐藏在一个巨大的   米色盒子,用于修复和现场。她很激动。

我正在尝试构建一个正则表达式,允许我搜索(和替换)此文本,其中与三个连续单词的第一个字母(不区分大小写和按字母顺序排列)匹配。

例如:让我们说我有这3个字符:vaf

对示例应用魔法正则表达式;它将返回Victory, across from

进一步的例子:

  • s i h将与Street. I had
  • 匹配
  • s w t将与She was thrilled
  • 匹配
  • t w v将与the work’s virtues
  • 匹配
  • a r s将与a restoration & site
  • 匹配

第四个例子可能过于复杂,因为它基本上需要忽略不是以字母字符开头的单词,而是将它们包含在任何结果中。

返回匹配后,我计划使用它来替换较大示例中的文本。

我也接受非正则表达式解决方案

1 个答案:

答案 0 :(得分:2)

<?php

$content = 'with an Angel of Victory, across from the Plaza Hotel, on Fifty-ninth Street. I had promised to alert the skeptical Harrison to the work’s virtues, but we found that it is now hidden in a huge beige box, for a restoration & site. She was thrilled.';

$characters = ['v', 'a', 'f'];
$patterns = [];

foreach ($characters as $character) {
    $patterns[] = sprintf('(%s[^\s]*)', preg_quote($character));
}

$regex = sprintf('~\b%s\b~i', implode('\s', $patterns));

preg_match($regex, $content, $matches);

print_r($matches);

我确定有更好的方法可以做到这一点。在这里你最终得到一个表达,如

\b        #word boundary
(v[^\s]*) #match first occurance of v until a space.
\s        #space
(a[^\s]*) #match first occurance of a until a space.
\s        #space
(f[^\s]*) #match first occurance of f until a space.
\b        #word boundary

哪个应该给你类似的东西

(
    [0] => Victory, across from
    [1] => Victory,
    [2] => across
    [3] => from
)

第四种场景正则表达式:(我会留给你分解这个)

~\b(a[^\s]*)\s(&[a-z]+;\s*)?(r[^\s]*)\s(&[a-z]+;\s*)?(s[^\s]*)\s(&[a-z]+;\s*)?