PHP删除特定单词后面的下两个单词

时间:2018-02-24 13:24:05

标签: php regex preg-replace

如何在PHP中使用preg_replace删除特定单词后面的下两个单词? 例如: 字符串:Lorem ipsum dolor sit amet,consetetur sadipscing elitr,sed diam。 具体词:ipsum 新字符串:Lorem ipsum amet,consetetur sadipscing elitr,sed diam。

这是我目前的代码:

$txt = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.
Specific word: ipsum";
$str= preg_replace('/\W\w+\s*(\W*)$/', '$1', $txt);
echo $str;

但是只是删除字符串的最后一个单词。

谢谢 最诚挚的问候

3 个答案:

答案 0 :(得分:2)

您可以使用(?<=ipsum)(?: \w+){2},但如果您想要包含标点符号,请使用(?<=ipsum)(?: [A-Za-z,.!]+){2}

function remove2w($anchor, $text, $number = 2) {
    return preg_replace(sprintf('/(?<=%s)(?: \w+){%s}/', $anchor, $number), '', $text);
}

输出:

remove2w('ipsum', 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.')
>>> Lorem ipsum amet, consetetur sadipscing elitr, sed diam.

答案 1 :(得分:1)

preg_replace()提供了相当大的灵活性:

<?php
$needle = "ipsum";
$haystack = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam. ";
$pattern = sprintf('|(%s)\s+\w+\s+\w+|', $needle);
var_dump(preg_replace($pattern, '$1', $haystack));

输出显然是:

string(57) "Lorem ipsum amet, consetetur sadipscing elitr, sed diam. "

答案 2 :(得分:0)

要完成这项任务需要考虑很多因素。

  1. 您的目标子字符串是否需要字边界?没有它们,你可能会进行无意识的匹配;但只有你能为你的项目决定这一点。
  2. 您是否需要不区分大小写的匹配?我猜:是的。
  3. 如果目标子字符串是字符串中的倒数第二个或最后一个字,会发生什么?你想要省略一个或零个单词吗?我猜:是的。
  4. 你需要考虑/包括标点符号,对吧?我猜:是的。
  5. 您的目标子字符串可能包含正则表达式敏感字符吗?如果是,建议使用preg_quote()。我猜:不,但是如果您不确定,可以在将针头注入图案之前调用preg_quote()
  6. 以下是完整的针头:(Demo

    $txt = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.';
    $needles = str_word_count(strtolower($txt),1);
    foreach($needles as $needle){
        echo "($needle) => ",preg_replace('~\b'.$needle.'\b\S*\K(?:\s\S+){0,2}~i','',$txt),"\n";  // use '(($0))'' as replacement to see the substring that is removed
    }
    

    输出:

    (lorem) => Lorem sit amet, consetetur sadipscing elitr, sed diam.
    (ipsum) => Lorem ipsum amet, consetetur sadipscing elitr, sed diam.
    (dolor) => Lorem ipsum dolor consetetur sadipscing elitr, sed diam.
    (sit) => Lorem ipsum dolor sit sadipscing elitr, sed diam.
    (amet) => Lorem ipsum dolor sit amet, elitr, sed diam.
    (consetetur) => Lorem ipsum dolor sit amet, consetetur sed diam.
    (sadipscing) => Lorem ipsum dolor sit amet, consetetur sadipscing diam.
    (elitr) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
    (sed) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed
    (diam) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.
    

    故障:

    ~                #pattern delimiter
    \b'.$needle.'\b  #match needle as a whole word
    \S*              #match zero or more trailing character as long as first character is a non-word character.  This may be replaced with [[:punct:]]+ if more desirable/accurate
    \K               #restart fullstring match
    (?:\s\S+){0,2}   #match zero, one or two sequences of: a whitespace character followed by one or more non-whitespace characters
    ~                #pattern delimiter
    i                #case-insensitive pattern modifier